Closed Gladskih closed 3 years ago
Mmm, yeah, it appears the original string
regex attempted to allow escaped double-quotes, semicolons, and backslashes using a negative lookahead: (?!\\)\\[;\\"]
. This pattern tried to match \"
, \;
, and \\
, without matching \\;
(an escaped backslash before a contraband character). But the negative look ahead is the wrong thing — the pattern (?!\\)\\
means "match any \
character which isn't a \
character" :P
What was needed was a negative look behind: (?<!\\)\\
. This means "match any \
character not preceded by a \
character".
I'll have this fixed up shortly, and include the colon, as well.
Also, in reading the docs, I'm realizing hex notation is not interpreted at all, and even though this fixed regex picks up escaped characters, no interpretation is being performed on them, either. Which is to say, the resulting strings from the parser do not match the actual content the rule source describes — it merely reflects the literal characters written in the rule. I'm not sure how you or anyone else uses this library, but I would like to offer at least the option to retrieve the actual content as a Python str/bytes — e.g. Setting('|00|butt').parsed == '\x00butt'
or something.
I thought I would have this out shortly, but I'm fuckin something up with the regex. Fortunately, I just discovered Lark provides an escaped string terminal :P Now it ought to be out shortly.
Okie dokes, fixed in #7, and uploaded to PyPI as version 0.2.3
When I try to parse rule with escaped semicolon it fails.
About escaping in docs:
But maybe my content value example is malformed as 6.7.1. content states also:
BTW the rule is accepted by suricata itself.