piprate / json-gold

A JSON-LD processor for Go
Apache License 2.0
259 stars 30 forks source link

~3x performance improvement by optimizing code hotspots wrt regexes #45

Closed andrewortman closed 3 years ago

andrewortman commented 3 years ago

two more bottlenecks discovered:

  1. IsAbsoluteUrl used the urls.ParseURL to determine if it had a protocol or not. The regex in that ParseURL function is quite expensive, especially when used 1000+ times in createTermDefinition with a schema.org Context - I used the logic defined in jsonld.js - by checking if it was an absolute url (via the go url package) or a blank node (starts with _:) - this passes all test and cuts the expansion of 100 schema.org json ld documents from 13 seconds to ~8 seconds.

  2. the term regex that I extracted in PR #42 was called so often, that I looked into simplifying. This regex simply checks for the existence of about half a dozen characters on the suffix of the term. I simplified this into a switch statement and further dropped processing times of the same 100 documents from 8 seconds to ~5 seconds

really just using pprof in my application to hunt down hotspots. There are a few more, but getting into dimension return territory at this point

kazarena commented 3 years ago

@andrewortman thank you for submitting the PR. I'll review asap.