no-context / moo

Optimised tokenizer/lexer generator! 🐄 Uses /y for performance. Moo.
BSD 3-Clause "New" or "Revised" License
814 stars 65 forks source link

add `u`nicode flag where missing #181

Open loveencounterflow opened 1 year ago

loveencounterflow commented 1 year ago

I realized that the below regex will be rejected with Invalid regular expression: [...] Range out of order in character class when using it in moo.compile() (Note: using extended syntax with extra whitespace for readability):

/// [        A-Z _ a-z \u{00a1}-\u{10ffff}  ] [  $ 0-9 A-Z _ a-z \u{00a1}-\u{10ffff}  ]* ///u

This can be fixed by using the u flag on lines 25 and 263. Of course, this is something of a hotfix because that would presumably turn all regexes into Unicode regexes (maybe a Good Thing?); therefore, I leave it to those with better acquaintance with the codebase to come up with a proper solution.