Open seansfkelley opened 10 months ago
I'd be happy to restructure this code to avoid the kinda-fishy dependency of the function on min_depth_rule_cache
from the outer scope, if you'd like, but for the time being I opted to keep the diff small and simple instead of undertaking a larger rewrite.
I noticed that the unparse script was brutally slow for even modestly-sized grammars and modest depths. I would typically see 5-10s generation times for a grammar with 30-40 rules and depth 14.
min_depths_rule
appears to be a memoizing cache, and also seems to be cleared unnecessarily frequently. The function that populates it,min_depth_cache
, only relies onrules
(immutable),i
(the rule index),visited
(always given as[]
) and themin_depths_rule
cache (always set to[]
before calling). Thus, as used, every call tomin_depth_cache
always returns the same value given the samei
.This commit adds an extra layer of caching, coupled with precomputation, to avoid calculating the min depth for each rule repeatedly. This results in a drop from 5-10s to a few milliseconds without any apparent change in result quality.
An earlier version of this change simply repurposed
min_depths_rule
as the precomputation cache by never clearing it. However, this subtly changed the results, an issue which I eventually traced to the observation that this cache gets populated with different values depending on which rule's min depth is currently being calculated. That is: the min depth for a rule depends on context, namely, the "root" rule in question, so this cache should not be reused across rules.Caveat: I don't really understand the unparsing algorithm at the conceptual level. My claim for the correctness of this change rests purely on the manual code analysis explained above.