kjosib / booze-tools

Booze Tools will become the complete programming-language development workbench, all written in Python 3.9 (for now).
MIT License
14 stars 1 forks source link

Regex Improvements: Case-insensitive sub-patterns, etc. #24

Open kjosib opened 5 years ago

kjosib commented 5 years ago

scanning.miniscan.Definition.scan(...) provides for a dirty trick: monkey-patching an extra attribute into an object defined elsewhere. scanning.miniscan.__BEGIN__._bracket_reference relies on that monkey-patch. That results in having to define subexpressions before they occur (which may sound like a fine thing, but isn't ideal:).

The metascanner problem be fixed by making regex AST nodes expect an environment passed into their translate-to-NFA method, which also facilitates nice things like case-insensitive sub-patterns.

I'm not sure if the monkey-patch scanner.env attribute gets used in the wild. Does it?

kjosib commented 4 years ago

If AST nodes are defined for bits of character-class structure (rather than "compiled" classes), then visitors may be defined to interpret those nodes in an any manner. In particular, curly-brace named references can be the province of semantic interpretation rather than a hack in the meta-lexer. This opens up more possibilities for Unicode-oriented scanning.

kjosib commented 4 years ago

Regular expressions now get completely turned to AST nodes rather than building character classes during the parse, so all reference names get delayed binding, so just don't make cycles, eh? (Although I suppose technically tail-recursion could be made to work, I don't think it's worth the effort.) This means the metascanner no longer uses the monkey-patch, and miniscan no longer supports it. A case-sense switch may not be far off.

kjosib commented 3 years ago

Related: a recent commit 57b0b71e6a188570d1395eccfc958c5ac47290f5 has switched from using hand-coded named-tuple based AST nodes to now using the arborist.trees subsystem, which didn't change any user-visible functionality but might bring this issue slightly closer to completion.