nitely / nim-regex

Pure Nim regex engine. Guarantees linear time matching
https://nitely.github.io/nim-regex/
MIT License
228 stars 21 forks source link

linear time findAll #71

Closed nitely closed 4 years ago

nitely commented 4 years ago

The findAll algorithm calls find repeatedly to find the next match, however this may take quadratic time, ex: findAll("aaaa", re"\w+\d|\w") this will call find 4 times and consume the entire string every time. I believe only regex with alternations where a term consumes the whole string while a second term matches part of it are affected.

This PR implements a findAll that always takes linear time. The caveat is it may take linear space, ex: the above example needs to store all the matches for the second term of the alternation until it finally fails to match the first term.

The boundaries are two machine words, so it may take 16x the size of the string. I think this is pretty bad, but it's the only way to guarantee linear time AFAICT.

This could support streams (#14), except it may consume unbounded memory, so no. On a second thought, I think it cannot possibly consume unbounded memory, memory is bounded by the regex len, so it's fixed memory. Captures could take unbounded space in their current form (full parse tree), but they would be bounded once only last capture is returned for every capture group (to be implemented).