The pattern matching semantics has some surprising corner cases related to predicate patterns, i.e., patterns that do not consume tokens. I take predicate patterns to mean lookahead (++p and --p) as well as the "positional" patterns In(ty), Start and End.
For example, let's look at what happens when negating predicates with the ! pattern:
Running this on foo foo bar results in bar bar, since !(Start) matches on the first foo and consumes it together with the subsequent foo. I would say the expected result here is foo bar bar. If we instead match on T(Bar) * !(Start), nothing happens since ! only matches if there is a term to consume in that position.
More generally, I would expect the negation of a zero-width pattern to consume zero terms. The semantics for negation of patterns longer than one is also unintuitive:
Running this on foo bar bar currently results in foo baz, whereas running it on foo foo bar does nothing. In the first case, the negation pattern is successfully matched against bar bar and then the first bar is consumed, after which T(Bar) matches on the second bar. In the second case, although foo foo matches the negation pattern, only the first foo is consumed, so the second foo does not match T(Bar). After advancing the cursor foo bar fails to match the negation pattern.
Another corner case is pattern matching the children of a zero-width sequence:
This example currently causes a segfault, since there are no children in the sequence matched by Start. More reasonable would be to fail matching, or possibly to (ideally statically) disallow the pattern since it will never succeed.
A related question is how to interpret matching on the the children of a sequence of more than one term:
Currently, Any binds to the first child of whatever comes first in the sequence (here Foo) which might be what we want. Other possible semantics include failing, or letting Any bind to the first child of every node in the parent pattern. If we go for requiring well formedness, we could disallow any <<-pattern where the parent pattern does not match exactly one term.
The pattern matching semantics has some surprising corner cases related to predicate patterns, i.e., patterns that do not consume tokens. I take predicate patterns to mean lookahead (
++p
and--p
) as well as the "positional" patternsIn(ty)
,Start
andEnd
.For example, let's look at what happens when negating predicates with the
!
pattern:Running this on
foo foo bar
results inbar bar
, since!(Start)
matches on the firstfoo
and consumes it together with the subsequentfoo
. I would say the expected result here isfoo bar bar
. If we instead match onT(Bar) * !(Start)
, nothing happens since!
only matches if there is a term to consume in that position.More generally, I would expect the negation of a zero-width pattern to consume zero terms. The semantics for negation of patterns longer than one is also unintuitive:
Running this on
foo bar bar
currently results infoo baz
, whereas running it onfoo foo bar
does nothing. In the first case, the negation pattern is successfully matched againstbar bar
and then the firstbar
is consumed, after whichT(Bar)
matches on the secondbar
. In the second case, althoughfoo foo
matches the negation pattern, only the firstfoo
is consumed, so the secondfoo
does not matchT(Bar)
. After advancing the cursorfoo bar
fails to match the negation pattern.Another corner case is pattern matching the children of a zero-width sequence:
This example currently causes a segfault, since there are no children in the sequence matched by
Start
. More reasonable would be to fail matching, or possibly to (ideally statically) disallow the pattern since it will never succeed.A related question is how to interpret matching on the the children of a sequence of more than one term:
Currently,
Any
binds to the first child of whatever comes first in the sequence (hereFoo
) which might be what we want. Other possible semantics include failing, or lettingAny
bind to the first child of every node in the parent pattern. If we go for requiring well formedness, we could disallow any<<
-pattern where the parent pattern does not match exactly one term.