weso / shapes-rs

RDF data shapes implementation in Rust
https://www.weso.es/shapes-rs/
Apache License 2.0
23 stars 1 forks source link

Refactor the ShEx compact parser to support streaming #32

Closed labra closed 5 months ago

labra commented 5 months ago

At this moment, the parser requires all the input in order to process a ShEx compact file. Although it works well for short files, for large ShEx files like FHIR which has 17932 lines it takes too long (2 hours) and it doesn't show activity until it finishes.

If we can parse the file in a streaming way, we can parse one ShEx statement after the other returning the intermediate result. In this way, we could interrupt the parser and still have an intermediate schema and we could show the activity while it is parsing.

Some blog posts that explain nom's streaming are: nom as a streaming parser

labra commented 5 months ago

The streaming way was not clearly improving the performance and it seemed that we could solve the issue by using an iterator per statement. We changed the grammar to parse a statement and return a statement each time.

The iterator returns each statement until it finishes and we can add a debug message for each statement.