kach / nearley

📜🔜🌲 Simple, fast, powerful parser toolkit for JavaScript.
https://nearley.js.org
MIT License
3.57k stars 231 forks source link

unparse.js: don't excessively recompute depth cache for massive speedup #642

Open seansfkelley opened 10 months ago

seansfkelley commented 10 months ago

I noticed that the unparse script was brutally slow for even modestly-sized grammars and modest depths. I would typically see 5-10s generation times for a grammar with 30-40 rules and depth 14.

min_depths_rule appears to be a memoizing cache, and also seems to be cleared unnecessarily frequently. The function that populates it, min_depth_cache, only relies on rules (immutable), i (the rule index), visited (always given as []) and the min_depths_rule cache (always set to [] before calling). Thus, as used, every call to min_depth_cache always returns the same value given the same i.

This commit adds an extra layer of caching, coupled with precomputation, to avoid calculating the min depth for each rule repeatedly. This results in a drop from 5-10s to a few milliseconds without any apparent change in result quality.

An earlier version of this change simply repurposed min_depths_rule as the precomputation cache by never clearing it. However, this subtly changed the results, an issue which I eventually traced to the observation that this cache gets populated with different values depending on which rule's min depth is currently being calculated. That is: the min depth for a rule depends on context, namely, the "root" rule in question, so this cache should not be reused across rules.

Caveat: I don't really understand the unparsing algorithm at the conceptual level. My claim for the correctness of this change rests purely on the manual code analysis explained above.

seansfkelley commented 10 months ago

I'd be happy to restructure this code to avoid the kinda-fishy dependency of the function on min_depth_rule_cache from the outer scope, if you'd like, but for the time being I opted to keep the diff small and simple instead of undertaking a larger rewrite.