Closed gilmoreorless closed 7 years ago
Do you think we should do this sorting in regexgen? Might have benefits for other projects as well.
Yeah, sounds like the sorting should be done in regexgen.
Merging this now. We can remove the sorting here once it’s implemented in regexgen. Thanks, @gilmoreorless!
No worries! I wasn't sure about doing the sorting within regexgen itself, for 2 reasons:
Trie#addAll
would be easy, but if a user is repeatedly calling Trie#add
with an unsorted list I don't know where the sorting would go.Cheers for the quick merge.
This is the fix for #16, which turned out to have 2 root causes:
regexgen
had a bug that resulted in nested alternations not being correctly sorted by length (fixed in devongovett/regexgen#16).regexgen
builds and simplifies its internal representation, I've found that it optimises better when the inputs are provided in order from longest to shortest. As such, sorting thesequences
list by length before passing it into theTrie
class produces a regex that correctly matches longer sequences before shorter ones.Interestingly, fixing #16 required both of the points above to be tackled at the same time. Adding a test for every sequence produced the following results:
regexgen
unchangedregexgen
bug fix, no pre-sortingregexgen
bug fix plus pre-sorting (this PR)