piprate / json-gold

A JSON-LD processor for Go
Apache License 2.0
259 stars 30 forks source link

performance improvement for Context.createTermDefinition #42

Closed andrewortman closed 3 years ago

andrewortman commented 3 years ago

this simple fix improves the speed of the expansion of small documents using the schema.org context by ~60% in my application. Every time Expand() was called, it would iterate through the entire list of schema.org IRIs and perform matching against these two regexes. However, it would also recompile these regexes every time before use. This took up a large amount of CPU time in my application.

The json-ld algorithms are still a black box into me, so I'm not quite sure what these regexes actually /do/ - I gave it my best when coming up for a name for them. I haven't really spent the time trying to understand why the algorithm has to do such expensive work on every .expand() with the same context. Perhaps context caching could somehow be implemented on the user side, I just haven't figured that out yet.

kazarena commented 3 years ago

@andrewortman this is excellent. Optimising regex'es was sitting in my todo list ever since I wrote the code. Thank you.

The current implementation is an almost direct port of JSON-lD algorithms from other reference implementations. So far, I tried keeping it as close as possible to the reference algorithms, in order to make it easier to bring in new changes. Now that 1.1 spec is stable, there is more room for engineering. Yes, if you use the same context over and over, it's a good opportunity to cache the expanded version. Unfortunately, it's not easy with the current JsonLdProcessor implementation. We should definitely keep it in mind for the next iteration of the library.