Closed SantiagoBautista closed 6 years ago
@mbudiu-vmw , this is the conflict I talked to you about in #1335 .
I found it while developing a P4 interpreter. By the way, this shows why having a light-weight p4-interpreter will be useful once it is finished: being easier to modify than a compiler, an interpreter can help to detect problems in the language changes even before we try to modify the compiler.
@jnfoster , I believe you could find this conflict interesting.
The grammar checked-in already contains functions at the top level. Look for functionDeclaration.
Yes, I know. What I am saying is that any attempt to transform the current grammar into an LR(1) parser will produce a shift/reduce conflict.
Toplevel functions are supported by the current grammar, but not by the current compiler implementation: if you look for functionDeclaration in the compiler implementation you will see that currently they can only appear in objDeclaration (line 389) but not in top-level declarations (line 270).
Hence, the conflict I describe is already present in the grammar, but will not be observable in the implementation until toplevel function declarations are supported. That is what I meant.
I actually had tested this in a branch, but this was before merging the type PR. Do you have a proposal on how to fix this?
I can think of two solutions:
Put the TYPE
token directly in the name
rule instead of putting it in nonTypeName
rule.
This works and maintains all the features of the language, but seems kind of ugly to me.
If you want I can create the associated pr to the grammar, so that you can see what I mean.
Not allowing "type" to be a name anymore, as it is a keyword of the language whose use as a name creates a conflict This would require to change files that use "type" like a name; like psa.p4, for example
Let's try the first solution. We don't control all the programs that people wrote.
I guess there is no reason to keep this issue open.
I just realized that there is another solution for this problem that might be better:
In the typeOrVoid
non-terminal, replace nonTypeName
by IDENTIFIER
.
The advantage is that this way the TYPE
keyword can be put back with the other keywords into nonTypeName
, and can hence be used as a nonTypeName in more places in a program.
The drawback is that the keywords apply
, key
, actions
and state
could not be used as return type for functions when using them as type variables.
What do you think?
So you make the grammar prettier but reject more programs? I prefer to optimize the opposite way.
There are some programs that are rejected by the solution we took and that are not with this other solution: the ones that want to use type
as a nonTypeName
(for example as the name of an extern, as a type argument, or as an lvalue).
So both solutions reject programs, and the question is which set of programs are more likely to be written.
OTOH, using "apply", "key", etc. as a type variables seems mostly useful for entries to the Obfuscated P4_16 Programming Contest :-)
On Thu, Jun 14, 2018 at 1:23 PM, Mihai Budiu notifications@github.com wrote:
So you make the grammar prettier but reject more programs? I prefer to optimize the opposite way.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/p4lang/p4c/issues/1336#issuecomment-397373738, or mute the thread https://github.com/notifications/unsubscribe-auth/ABwi0nLWTgKJiLzGsm3FOLE71TBlLWZiks5t8pwGgaJpZM4UlK7U .
In practice neither of them will be probably written.
Both solutions reject programs that will probably (and hopefully) never be written; and one of them makes the grammar prettier (which can avoid future problems) :)
The feel free to submit a PR with these changes and we'll take a look.
Did at #1344 and p4-spec#652
Let us consider the two following p4programs.
Example 1:
Example 2:
With p4-16 1.1 grammar, both are syntactically correct.
Both have the two-tokens-sequence
type id
whereid
is a TYPE_IDENTIFIER andtype
is theTYPE
token; but in one case this corresponds to anonTypeName
followed by a name; and in the other case this corresponds to thetype
keyword of a newtype declaration. So, when the parser has read theTYPE
token and looks ahead toTYPE_IDENTIFIER
, it could either reduce a nonTypeName out ofTYPE
, of shift to recognize a newtype declaration.Hence, there is a conflict.
This conflict is not yet observable in the compiler, as toplevel functions are not yet implemented. Hence for now, the compiler rejects the first example for the wrong reason: it doesn't expect the opening parenthesis of a function definition.
But when support for toplevel function declarations is added, this conflict will arise.
If it is unclear, let me explain why the definition of function
id
is syntactically correct in example 1. The first timetype
appears, it specifies the return type of the function, and this is allowed because the return type can be anonTypeName
, as it can be a type parameter that has not been parsed yet, like here. Then there isid
, that will be parsed as aTYPE_IDENTIFIER
because of the previous typedef declaration; but that is ok because function names are allowed to be any name, so in particular a typeName. The second timetype
appears it is a type parameter, that can be a nonTypeName, so in particulartype
. The third timetype
appears, it is atypeRef
, more specifically atypeName
, since it was promoted as a type name by the parsed type parameter. The rest of the function declaration is less surprising.