hsutter / cppfront

A personal experimental C++ Syntax 2 -> Syntax 1 compiler
Other
5.39k stars 232 forks source link

[BUG] Recognize _postfix-expression_ in _is-value-constraint_ #358

Open JohelEGP opened 1 year ago

JohelEGP commented 1 year ago

The grammar permits the expression of is-value-constraint to be a postfix-expression. But if the postfix-expression is not a primary-expression, it fails as below.

Relevant Cpp2 grammar extract. ```C++ //G postfix-expression: //G primary-expression //G postfix-expression postfix-operator [Note: without whitespace before the operator] //G postfix-expression '[' expression-list ']' //G postfix-expression '(' expression-list? ')' //G postfix-expression '.' id-expression //G //G is-as-expression: //G prefix-expression //G is-as-expression is-type-constraint //G is-as-expression is-value-constraint //G is-as-expression as-type-cast //GTODO type-id is-type-constraint //G //G is-type-constraint //G 'is' type-id //G //G is-value-constraint //G 'is' expression ```
Current full Cpp2 grammar extracted from the sources. ```C++ //G binary-digit: //G one of '0' '1' //G //G digit: one of //G binary-digit //G one of '2' '3' '4' '5' '6' '7' '8' '9' //G //G hexadecimal-digit: //G digit //G one of 'A' 'B' 'C' 'D' 'E' 'F' //G //G nondigit: //G one of 'a'..'z' //G one of 'A'..'Z' //G _ //G //G identifier-start: //G nondigit //G //G identifier-continue: //G digit //G nondigit //G //G identifier: //G identifier-start //G identifier identifier-continue //G 'operator' operator //G //G simple-escape-sequence: //G '\' { any member of the basic character set except u, U, or x } //G //G hexadecimal-escape-sequence: //G '\x' hexadecimal-digit //G hexadecimal-escape-sequence hexadecimal-digit //G //G universal-character-name: //G '\u' hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit //G '\U' hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit hexadecimal-digit //G //G escape-sequence: //G hexadecimal-escape-sequence //G simple-escape-sequence //G //G s-char: //G universal-character-name //G escape-sequence //G basic-s-char //G //G basic-s-char: //G any member of the basic source character set except '"' '\' or new-line //G //G c-char: //G universal-character-name //G escape-sequence //G basic-c-char //G //G basic-c-char: //G any member of the basic source character set except ''' '\' or new-line //G //G keyword: //G any Cpp1-and-Cpp2 keyword //G one of: 'import' 'module' 'export' 'is' 'as' //G //G encoding-prefix: one of //G 'u8' 'u' 'uR' 'u8R' 'U' 'UR' 'L' 'LR' 'R' //G //G token: //G identifier //G keyword //G literal //G operator-or-punctuator //G //G operator-or-punctuator: //G operator //G punctuator //G //G operator: one of //G '/=' '/' //G '<<=' '<<' '<=>' '<=' '<' //G '>>=' '>>' '>=' '>' //G '++' '+=' '+' //G '--' '-=' '->' '-' //G '||=' '||' '|=' '|' //G '&&=' '&&' '&=' '&' //G '*=' '*' //G '%=' '%' //G '^=' '^' //G '~=' '~' //G '==' '=' //G '!=' '!' //G //G punctuator: one of //G '...' '.' //G '::' ':' //G '{' '}' '(' ')' '[' ']' ';' ',' '?' '$' //G //G //G literal: //G integer-literal //G character-literal //G floating-point-literal //G string-literal //GTODO boolean-literal //GTODO pointer-literal //G //G integer-literal: //G binary-literal //G hexadecimal-literal //G decimal-literal //G //G binary-literal: //G '0b' binary-digit //G '0B' binary-digit //G binary-literal binary-digit //G binary-literal ''' binary-digit //G //G hexadecimal-literal: //G '0x' hexadecimal-digit //G '0X' hexadecimal-digit //G hexadecimal-literal hexadecimal-digit //G hexadecimal-literal ''' hexadecimal-digit //G //G //G decimal-literal: //G digit [uU][lL][lL] //G decimal-literal digit [uU][lL][lL] //G decimal-literal ''' digit [uU][lL][lL] //G //G floating-point-literal: //G digit { ' | digit }* . digit ({ ' | digit }*)? ([eE][-+]?digit { ' | digit }*) [fFlL] //G //G TODO full grammar & refactor to utility functions with their //G own unit test rather than inline everything here //G //G string-literal: //G encoding-prefix? '"' s-char-seq? '"' //G encoding-prefix? 'R"' d-char-seq? '(' s-char-seq? ')' d-char-seq? '"' //G //G s-char-seq: //G interpolation? s-char //G interpolation? s-char-seq s-char //G //G d-char-seq: //G d-char //G //G interpolation: //G '(' expression ')' '$' //G //G character-literal: //G encoding-prefix? ''' c-char-seq? ''' //G //G c-char-seq: //G c-char //G c-char-seq c-char //G //G prefix-operator: //G one of '!' '-' '+' //GT parameter-direction //G //G postfix-operator: //G one of '++' '--' '*' '&' '~' '$' //G //G assignment-operator: //G one of '=' '*=' '/=' '%=' '+=' '-=' '>>=' '<<=' '&=' '^=' '|=' //G //G primary-expression: //G inspect-expression //G id-expression //G literal //G '(' expression-list ')' //G '{' expression-list '}' //G unnamed-declaration //G //G postfix-expression: //G primary-expression //G postfix-expression postfix-operator [Note: without whitespace before the operator] //G postfix-expression '[' expression-list ']' //G postfix-expression '(' expression-list? ')' //G postfix-expression '.' id-expression //G //G prefix-expression: //G postfix-expression //G prefix-operator prefix-expression //GTODO await-expression //GTODO 'sizeof' '(' type-id ')' //GTODO 'sizeof' '...' ( identifier ')' //GTODO 'alignof' '(' type-id ')' //GTODO throws-expression //G //G multiplicative-expression: //G is-as-expression //G multiplicative-expression '*' is-as-expression //G multiplicative-expression '/' is-as-expression //G multiplicative-expression '%' is-as-expression //G //G additive-expression: //G multiplicative-expression //G additive-expression '+' multiplicative-expression //G additive-expression '-' multiplicative-expression //G //G shift-expression: //G additive-expression //G shift-expression '<<' additive-expression //G shift-expression '>>' additive-expression //G //G compare-expression: //G shift-expression //G compare-expression '<=>' shift-expression //G //G relational-expression: //G compare-expression //G relational-expression '<' compare-expression //G relational-expression '>' compare-expression //G relational-expression '<=' compare-expression //G relational-expression '>=' compare-expression //G //G equality-expression: //G relational-expression //G equality-expression '==' relational-expression //G equality-expression '!=' relational-expression //G //G bit-and-expression: //G equality-expression //G bit-and-expression '&' equality-expression //G //G bit-xor-expression: //G bit-and-expression //G bit-xor-expression '^' bit-and-expression //G //G bit-or-expression: //G bit-xor-expression //G bit-or-expression '|' bit-xor-expression //G //G logical-and-expression: //G bit-or-expression //G logical-and-expression '&&' bit-or-expression //G //G logical-or-expression: //G logical-and-expression //G logical-or-expression '||' logical-and-expression //G //G assignment-expression: //G logical-or-expression //G assignment-expression assignment-operator logical-or-expression //G // eliminated condition: - use expression: //G assignment-expression //GTODO try expression //G //G expression-list: //G parameter-direction? expression //G expression-list ',' parameter-direction? expression //G //G type-id: //G type-qualifier-seq? qualified-id //G type-qualifier-seq? unqualified-id //G //G type-qualifier-seq: //G type-qualifier //G type-qualifier-seq type-qualifier //G //G type-qualifier: //G 'const' //G '*' //G //G is-as-expression: //G prefix-expression //G is-as-expression is-type-constraint //G is-as-expression is-value-constraint //G is-as-expression as-type-cast //GTODO type-id is-type-constraint //G //G is-type-constraint //G 'is' type-id //G //G is-value-constraint //G 'is' expression //G //G as-type-cast //G 'as' type-id //G //G unqualified-id: //G identifier //G template-id //GTODO operator-function-id //G //G template-id: //G identifier '<' template-argument-list? '>' //G //G template-argument-list: //G template-argument-list ',' template-argument //G //G template-argument: //G # note: < > << >> are not allowed in expressions until new ( is opened //G expression //G type-id //G //G qualified-id: //G nested-name-specifier unqualified-id //G member-name-specifier unqualified-id //G //G nested-name-specifier: //G '::' //G unqualified-id '::' //G //G member-name-specifier: //G unqualified-id '.' //G //G id-expression //G qualified-id //G unqualified-id //G //G literal: //G integer-literal ud-suffix? //G character-literal ud-suffix? //G floating-point-literal ud-suffix? //G string-literal ud-suffix? //G boolean-literal ud-suffix? //G pointer-literal ud-suffix? //G user-defined-literal ud-suffix? //G //G expression-statement: //G expression ';' //G expression //G //G selection-statement: //G 'if' 'constexpr'? expression compound-statement //G 'if' 'constexpr'? expression compound-statement 'else' compound-statement //G //G return-statement: //G return expression? ';' //G //G iteration-statement: //G label? 'while' logical-or-expression next-clause? compound-statement //G label? 'do' compound-statement 'while' logical-or-expression next-clause? ';' //G label? 'for' expression next-clause? 'do' unnamed-declaration //G //G label: //G identifier ':' //G //G next-clause: //G 'next' assignment-expression //G //G alternative: //G alt-name? is-type-constraint '=' statement //G alt-name? is-value-constraint '=' statement //G alt-name? as-type-cast '=' statement //G //G alt-name: //G unqualified-id : //G //G inspect-expression: //G 'inspect' 'constexpr'? expression '{' alternative-seq? '}' //G 'inspect' 'constexpr'? expression '->' type-id '{' alternative-seq? '}' //G //G alternative-seq: //G alternative //G alternative-seq alternative //G //G jump-statement: //G 'break' identifier? ';' //G 'continue' identifier? ';' //G //G statement: //G selection-statement //G inspect-expression //G return-statement //G jump-statement //G iteration-statement //G compound-statement //G declaration //G expression-statement //G contract //GTODO try-block //G //G compound-statement: //G '{' statement-seq? '}' //G //G statement-seq: //G statement //G statement-seq statement //G //G parameter-declaration: //G this-specifier? parameter-direction? declaration //G //G parameter-direction: one of //G 'in' 'copy' 'inout' 'out' 'move' 'forward' //G //G this-specifier: //G 'implicit' //G 'virtual' //G 'override' //G 'final' //G //G parameter-declaration-list //G '(' parameter-declaration-seq? ')' //G //G parameter-declaration-seq: //G parameter-declaration //G parameter-declaration-seq ',' parameter-declaration //G //G contract: //G '[' '[' contract-kind id-expression? ':' logical-or-expression ']' ']' //G '[' '[' contract-kind id-expression? ':' logical-or-expression ',' string-literal ']' ']' //G //G contract-kind: one of //G 'pre' 'post' 'assert' //G //G function-type: //G parameter-declaration-list throws-specifier? return-list? contract-seq? //G //G throws-specifier: //G 'throws' //G //G return-list: //G '->' type-id //G '->' parameter_declaration_list //G //G contract-seq: //G contract //G contract-seq contract //G //G meta-constraints: //G 'is' id-expression //G meta-constraints ',' id-expression //G //G unnamed-declaration: //G ':' template-parameter-declaration-list? function-type requires-clause? '=' statement //G ':' template-parameter-declaration-list? type-id? requires-clause? '=' statement //G ':' template-parameter-declaration-list? type-id //G ':' template-parameter-declaration-list? 'type' meta-constraints? requires-clause? '=' statement //G ':' 'namespace' '=' statement //G //G requires-clause: //G 'requires' expression //G //G template-parameter-declaration-list //G '<' parameter-declaration-seq '>' //G //G alias //G ':' template-parameter-declaration-list? 'type' '==' type-id ';' //G ':' 'namespace' '==' qualified-id ';' //G ':' template-parameter-declaration-list? '_'? '==' expression ';' //G //GT ':' function-type '==' expression ';' //GT # See commit 63efa6ed21c4d4f4f136a7a73e9f6b2c110c81d7 comment //GT # for why I don't see a need to enable this yet //G declaration: //G access-specifier? identifier unnamed-declaration //G access-specifier? identifier alias //G //G access-specifier: //G public //G protected //G private //G //G declaration-seq: //G declaration //G declaration-seq declaration //G //G translation-unit: //G declaration-seq? ```

Minimal reproducer (https://godbolt.org/z/z6PTPWGro):

a: int   = 0;
b: * int = a&;
c: bool  = b is a&;

Commands:

cppfront x.cpp2

Expected result: The same as wrapping the postfix-expression in parentheses.

Actual result and error:

main.cpp2(3,18): error: ill-formed initializer (at '&')
main.cpp2(3,1): error: unexpected text at end of Cpp2 code section (at 'c')
main.cpp2(1,0): error: parse failed for section starting here
JohelEGP commented 1 year ago

I think the answer may come from what became https://github.com/hsutter/cppfront/wiki/Design-note%3A-Unambiguous-parsing:

Yes, my intent there was that the productions are tried in order. I do this in a few cases (statement is another). The intent is that by deterministically taking the first match, we can eliminate any ambiguity when input could match more than one production. In this case, "if it can be an expression, it is." -- Extract from https://github.com/hsutter/cppfront/issues/50#issuecomment-1272645757.

I wonder what the order is. I extract the grammar with git grep '//G' include/* source/* | sed 's/.*\(..G\)/\1/'.

JohelEGP commented 1 year ago

My understanding is that the a in b is a& matches the type-id production in the is-type-constraint of

//G is-as-expression:
//G     prefix-expression
//G     is-as-expression is-type-constraint
//G     is-as-expression is-value-constraint
//G     is-as-expression as-type-cast

before having a chance at is-value-constraint. So there's a & leftover with matches nothing (is-as-expression has a lower precedence than prefix-expression, so it can't be a postfix operator). This means that we have to write b is (a&) and (b as t)*.

JohelEGP commented 1 year ago

I bring up again the suggestion at the end of https://github.com/hsutter/cppfront/issues/352#issuecomment-1504047137. It should be possible to improve the error message from main.cpp2(3,18): error: ill-formed initializer (at '&') to main.cpp2(3,18): error: unmatched text in front of is-as-expression (at '&').