intel / hyperscan

High-performance regular expression matching library
https://www.hyperscan.io
Other
4.8k stars 715 forks source link

giving pattern with null to hs_compile_lit_multi #427

Open chkp-michaelgo opened 8 months ago

chkp-michaelgo commented 8 months ago

At first I tried to use lens to hs_compile_lit_multi to indicate that null should be treated as 8 bit integer with value 0 and not as a terminator. It did not work, it ignored the lens argument and treated zero value as terminator, if it was the first character it threw an error.

I tried to use \x00 . \x worked for values that are not 00 just fine. For \x00 it did not throw exceptions even when it was at the start of the pattern, but it did not match on buffers with nulls either.

Is there a flag to force hs_compile_lit_multi to treat the input patterns as raw data and not perform any unescaping or special treatment of \xXX or \0 and just treat the lens argument as the actual lengths of the patterns?

Is there a flag to make hs_compile_lit_multi treat all escape sequences including \x00 and \0 consistently?

chkp-michaelgo commented 7 months ago

A small clarification - the documentation https://intel.github.io/hyperscan/dev-reference/compilation.html clearly states that null does not indicate the length and that lens argument is what controls the length so this is seems like a bug.

For new APIs, the length of each literal pattern is a newly added parameter. Hyperscan needs to locate the end position of the input expression via clearly knowing each literal’s length, not by simply identifying character \0 of a string.