Closed MegaIng closed 1 year ago
So perhaps we can limit ourselves to checking a fixed amount of collisions, say 8, after which we'll just write something like "8 regex collisions reached; disabling detection. More collisions may exist."
Also, I think we can reduce the example search time to 200ms or similar, it's already quite a bit of time. But maybe for strict-mode put it even higher at 2000ms, because then we only have to call it once.
What do you think?
Yeah, putting a limit on the total amount of collisions is something I also considered.
For search time, I sadly don't have a good measurement of the parameter I can control -> time it takes. But I could try and see if I can figure out something.
Ok, I made a few improvements. It now should be relatively reliable, and I think there are a few changes I can do in just interegular
to make it so that it manages to provide examples for all collisions in the above mentioned grammar very quickly, where previously it only manage to report one or two. Via testing, I calculated an approximation for max_time-> max_iterations. This should be decently reliable unless the hardware or python implementation is very slow.
Use
limit_depth
when creating an example for a collision to prevent unbounded memory and time usage.An example for two regex which do have a collision, but which can take a long time to find the example for are:
Coming from this grammar: https://github.com/adbancroft/TunerStudioIniParser/blob/master/ts_ini_parser/grammars/pre_processor.lark
However, even with this particular change, loading that grammar still takes a few seconds since almost all regex collide with each other.
The reason for many of these collisions is that the grammar is unclear whether or not the keywords are case insensitive, so all these warnings and collisions are actually fully accurate and using
#ErRor
instead of#error
will cause problems for the grammar.