weso / shapes-rs

RDF data shapes implementation in Rust
https://www.weso.es/shapes-rs/
Apache License 2.0
23 stars 1 forks source link

Improve performance of ShExC compact parser #35

Closed labra closed 4 months ago

labra commented 5 months ago

When parsing the whole FHIR schema, the compact parser seems to be very slow. The FHIR schema is about 17932 lines.

This has been a good stress test for the compact parser to check that the current performance of the parser is quite bad (around 2 hours) when the ShEx.js takes about 20s. So we must definitely find ways to improve the performance.

One alternative is to abandon nom and look for a different parsing library reconsidering this decision.

Meanwhile, we could also invest in profiling the parsre and check what parts we can optimize. We started some infratructure for profiling the parser and we were able to generate nice flamegraphs.

labra commented 4 months ago

I made a refactor in the tracing system avoiding to add more traces than needed, specially, avoiding to create a trace message when it is not printed and the performance has greatly improved.

At this moment, the compact parsers takes 20s to parse the FHIR schema which seems to be aligned with the ShEx.js implementation.

I think there is still room for improvement in the way it handles alternatives...but by now, we can close this issue.