Open tothtamas28 opened 4 months ago
I dug into this.
This defines the same set of valid tokens for Foo as the previous definition. But now there's a parse error:
I think you identified the reason here, but for concreteness the issue is a collision with the regex for #LowerId
defined in kast.md
.
Why doesn't (2) work if (1) does?
Regex terminals have a lower priority than non-regex terminals. The logic isn't especially obvious, but when we generate scanners there's a sorting process that happens here:
This ordering places regexes at the end of the list, which means that if any non-regex matches first then the overlapping regex won't get considered. It also answers the question of why the #LowerId
regex gets tried (and succeeds) before the Foo
one. If you reverse this ordering, the code in (1) fails because foo
is tokenized as a #LowerId
.
Why does the kompiler ask for the format attribute for (3) if for (2) it does not?
Because of the tokenization problem, inner parsing hasn't even finished when the error in (2) is emitted. This means that we don't have the full semantic information to validate the attributes. For (3), inner parsing has succeeded and we can validate the attributes.
Consider the following definitions.
1. With terminal
This works as expected.
2. With regex terminal
Let's change the symbol to a regex terminal.
This defines the same set of valid tokens for
Foo
as the previous definition. But now there's a parse error:3. Fix potential lexical conflicts
Speculatively, let's change the token to be disjoint from other lexicals in the prelude.
Now the error is something different:
4. Add the
format
attributeAfter adding the
format
attribute, kompilation works.Questions
This raises a few questions.
format
attribute for (3) if for (2) it does not?