nitely / nim-regex

Pure Nim regex engine. Guarantees linear time matching
https://nitely.github.io/nim-regex/
MIT License
225 stars 20 forks source link

Alternations optimization #141

Open nitely opened 3 months ago

nitely commented 3 months ago

In theory this speeds up regex containing hundreds of alternations (see #138). How? the first state contains a state per alternation; the optimization reduces the number of alternations (at the very least the outer one), so the first state will be capped to [a-zA-Z-0-9] (+ symbols) states in the case of ASCII, so 1K initial states will get reduced to ~50. For find all this matters a lot because it tries to match the initial states to every input character.

The description is in this gist https://gist.github.com/nitely/745c8cabdf06ba2d37f8cf5cda3aea5f

This is just a PoC, though. I'm just optimizing the simplest case, since it should speed up #138

it's also a (broken) WIP that cannot even be tested.

nitely commented 2 months ago

We can go further than just literals, consider (^abc|^acb) -> ^a(bc|cb). The same can be applied to suffixes, consider (car|bar) -> (b|c)ar, this would be useful for the literals optimization to kick in.