Open refi64 opened 8 years ago
Positive lookbehinds now work according to method 1.
Should be finished now!
Although it'd be really amazing to have variable-length look{ahead,behind} ... I had some uses for 'em :)
@vendethiel Variable-length lookaheads work, though! Lookbehinds are a little trickier, and I'm trying to think of not-insanely-inefficient ways to do it...but ATM it's not that high-priority.
Okay, well, I'll be interested to know if you ever implement this. Allows some crazy obscure tricks.
Y'know what screw it. I think I know how to implement this more efficiently than I thought. (Or at least on X64; not sure what to do on X86 yet, since I'm pretty much out of registers...).
Cool :D. Feel free to page me here when you commit an implementation, or if you wanna write down technical details.
I'm pretty sure it should be possible to implement reasonably fast variable-length lookbehinds, so long as the lookbehind expression only matches words of a regular language (so no nested backreferences or lookaheads or lookbehinds in a lookbehind). If that's the case, you could compile the subexpression into a separate DFA/NFA with all concatenations reversed, and attempt to match the reverse of string[:current_position]
.
(That doesn't sound particularly easy, though.)
@pyos That's actually a really good idea...thanks!
There are two ways to approach this:
N
of the lookbehind assertion, and check for it exactlyN
characters back in the string. This is what Python does. I personally find it slightly restrictive...2. Allow any expression. Just step back through the string one character at a time and test if the lookbehind matches AND ends right at the previous position. This is obviously very slow but much more flexible. One optimization here could be to calculate the minimum length of the match and start there. Doing the same thing with a max length may not be worth it, since stuff like(?<=a{2,5})
isn't that common.My idea is to combine both: when the first approach can be used, then use it! When the given regex wouldn't work with it (e.g.Using method 2 in practice utterly failed. Not even Perl allows that. I just went with method 1.(?<=a+)b
), then use method 2.