kjosib / booze-tools

Booze Tools will become the complete programming-language development workbench, all written in Python 3.9 (for now).
MIT License
14 stars 1 forks source link

Binary Scanning Support #37

Closed kjosib closed 3 years ago

kjosib commented 3 years ago

Network protocols like HTTP are written in terms of byte streams, only portions of which are necessarily valid text in any particular encoding. It might be nice to apply booze-tools to such streams rather than hand-code rickety parsing.

From a technical perspective, adjusting the scanner algorithm to support an array of (unsigned) bytes would not be too hard. (It could be factored out as a strategy pattern.)

To adjusting the regex-language to deal with bytes rather than characters, just chop off any char-class boundaries higher than 255. (And then consider making a look-up table instead of a search tree -- which could be done at initialization time.)

Ideally, the "compiled" language definition object should include a flag as to whether to process text or binary, and then the "typical application" sugar would respect it to select an inspection strategy.