Open 0xekez opened 5 years ago
Actually it was a deliberate decision to not use regular expressions: The underlying DFAs can get prohibitively large when many regexps are combined through '|'.
For individual patterns, I'm not sure paraglob could directly support regexps, as in general I don't think there's much of a way to tokenize them into fixed strings.
That all said, it was recently pointed out that hyperscan (https://www.hyperscan.io) can supposedlt match large numbers of regexps in parallel. It be interesting to understand if it can support our use case, and if so, how it's implemented so that it doesn't run into the DFA state explosion.
On Wed, May 29, 2019 at 12:09 -0700, Zeke Medley wrote:
Currently paraglob supports glob style patterns. Zeek uses a different pattern type in its scripting layer which use the same syntax as flex regular expressions. This pattern matching is implemented in
src/RE.h/cc
inside thezeek/zeek
repo. Adding support these patterns in paraglob could potentially make it more useful for people using Zeek.I think the current meta-word extraction approach should still work fine with some slightly more complicated parsing. Then its just a matter of determining what sort of patterns the paraglob contains in its constructor and matching using the appropriate method during get operations.
It would also be interesting to consider when combining regex style patterns with a
|
might increase performance.-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/zeek/paraglob/issues/5
-- Robin Sommer Corelight, Inc. robin@corelight.com * www.corelight.com
Currently paraglob supports glob style patterns. Zeek uses a different pattern type in its scripting layer which use the same syntax as flex regular expressions. This pattern matching is implemented in
src/RE.h/cc
inside thezeek/zeek
repo. Adding support for these patterns in paraglob could potentially make it more useful for people using Zeek.I think the current meta-word extraction approach should still work fine with some slightly more complicated parsing. Then its just a matter of determining what sort of patterns the paraglob contains in its constructor and matching using the appropriate method during get operations.
It would also be interesting to consider when combining regex style patterns with a
|
might increase performance.