theY4Kman / parsuricata

Parse Suricata rules
https://pypi.org/project/parsuricata/
MIT License
13 stars 3 forks source link

fix: properly support escape sequences in setting strings #7

Closed theY4Kman closed 3 years ago

theY4Kman commented 3 years ago

This PR uses a better escaped string regex terminal, which properly supports escape sequences. This terminal is imported in the grammar from Lark's library, and is defined as:

_STRING_INNER: /.*?/
_STRING_ESC_INNER: _STRING_INNER /(?<!\\)(\\\\)*?/

ESCAPED_STRING : "\"" _STRING_ESC_INNER "\""

Which it says is roughly equivalent to the regex: ".*?(?<!\\)". See https://github.com/lark-parser/lark/blob/3b2bf47dc4750add61df4e236238a626b79d3da0/docs/json_tutorial.md#part-1---the-grammar

A number of test cases have been added to ensure it works.


Also, in this PR I simplified the grammar just a smidge, and passed the transformer directly to the parser, which the Lark docs say "[avoids] building the parse tree, and just [sends] the data straight into our transformer" — which apparently improves performance and memory efficiency.

I really ought to add a large corpus of rules to the tests to benchmark.