ridiculousfish / regress

REGex in Rust with EcmaScript Syntax
Apache License 2.0
176 stars 11 forks source link

Pure String Literal searching #21

Open neeldug opened 4 years ago

neeldug commented 4 years ago

I saw on the README that there was an opportunity to improve the algorithm used for pure string literal searching, I'm trying to implement the crate boyer-moore-magiclen, to improve this, however, I'm unable to work out exactly where regress searches for these, my inclination was to start here. It'd be much appreciated if anyone could point me in the right direction for where to start!

ridiculousfish commented 4 years ago

Sorry for taking a while to get to this. Probably the simplest way to integrate this would be a new StartPredicate variant.

When analyzing the regex IR we can tell if matches must begin with a certain byte sequence; here's where that happens. Right now we only record the first 4 bytes, even if the prefix must be longer. We could have a variant that finds arbitrarily long byte sequences through Boyer-Moore.