Open conartist6 opened 3 years ago
I'm going to build this. I hadn't looked at the code in long enough that I forgot how some of it works. I took it as an opportunity to improve the docs a bit as I rediscovered where I put the important logic. I'm confident once again that I know how to do what this issue describes, and I'll do it just as soon as the macrome generated artifacts are stable again (shifted from timestamps to hashes)
This will be significantly simpler once #35 lands I think.
I'm ready to implement this, and I'm less likely to mess it up since I just spent so much time rewriting and debugging the engine.
The architectural question is how to look at expressions when succeeding or failing. Currently success and failure are the responsibility of sequences, and expressions are assumed to be safe to prune when they no longer are needed to hold a list.
Currently there are two places in the code that are relevant.
In remove
a failing sequence may replace its parent if it has no siblings, which would be incorrect if its parent is a lookahead.
https://github.com/iter-tools/regex/blob/58a456b495f6d308b19a2341e1cdc28134722356/lib/internal/engine.ts#L122
A succeeding sequence is promoted, which may allow it to eliminate expressions:
https://github.com/iter-tools/regex/blob/58a456b495f6d308b19a2341e1cdc28134722356/lib/internal/engine.ts#L273-L287
The question will be, do I want a recursive solution, which would allow me to program the behavior as a a method overload, or should I try to avoid recursion, as I am unsure how deep the state tree may be?
A middle ground may be for a lookahead to be a subclass of match, say SpeculativeMatch extends Match
. This would make it easier to quickly check for lookaheads, as matches have their own linkages. I think that's probably the thing to do.
If I do use a SpeculativeMatch
it will need some way to know that the lookahead has succeeded. This will require support in regex.ts
which currently would only ever return a success result when the entire pattern is terminated. If lookahead is a kind of match though, it would need a success result to be returned or else it would not be possible for the lookahead to succeed!
I've been assuming I need engine-level support for this, but is that really true?
It's not too hard to create an endLookahead
matcher which will fail when its contents succeed. What I guess isn't possible (?) is to do the other thing, to fail the whole expression if the sequence fails. When a sequence fails the chain of matchers is broken, and regex.ts
has no more control over what happens. So yes, I think it is true.
if I make implement a SpeculativeMatch
type in engine how will that impact the chaining of matchers? What is next in the chain after a lookahead, and how is that going to be called?
I'll have a next
-- a reference to the thing that follows the sequence. But that will already be used immediately because it'll be one of the two sequences held (initially) by the speculative match. So it will indeed be safe to just ignore the next
reference when causing a speculative match to succeed
A failure of the contingent
sequence will also need to erase the speculative
sequence. How will we know when to do this? The conditions are slightly different -- the lookahead sequence may still be active, which is to say that a SpeculativeMatch
fails when it has one remaining sequence vs a normal Match
which fails when it has 0
I think I want a SpeculativeSequence
and a SpeculativeExpression
. This way when a speculative match fails it will be a sequence failing that can overload methods to do the correct work.
It's time! I need this feature myself, and am building it.
Current problem: captures. First, I don't really remember entirely how what I built works, and I'm not 100% sure it's right.
Second: lookahead has some really weird cases. Evaluating something like /(a)(?=(b))/.exec('abc')
I actually need to be able to insert the capture group 2 result (b
) into the captures after the rest of the pattern has already succeeded!
For an execution like /(a)(?=.(c))(b)/.exec('abc')
I need the captures to be a
, c
, b
, which is tricky because the order we'll actually capture in is a
, b
, c
.
I think I've duplicated part of the engine's data structure in the stack-of-lists data structure I use for captures.
OK, that's fixed. Captures themselves no longer contain parentList
, which is now part of captureStack
Support lookahead assertions. Given a
lookahead
expression and acontingent
sequence, here's what has to happen:Make a lookahead state at the engine level. It will essentially be a normal expression state of
[lookahead, contingent]
with a bit of special sauce in handling success and failure.lookahead
succeeds it must fail thelookahead
expression, causingcontingent
to become the best alternativelookahead
fails the entire lookahead state must fail.