Open LeonardoVal opened 9 years ago
The grammar you posted is not LALR(1) as the first value choice STR
is
not exactly matching the other choices' FIRST set for id
: {STR, ID}.
The quickest grammar fix to make it LALR(1) is this (only part of the
grammar file shown: note the change in the first choice for the value
rule):
%%
value
: id {$$ = eval($1);}
| '[' ']' {$$ = [];}
| '[' values ']' {$$ = $2;}
| id '(' ')' {$$ = [$1, []]; }
| id '(' values ')' {$$ = [$1, $3];}
;
but then of course you accept ID next to STR, thus slightly augmenting your 'language' as specified in the grammar, while the first choice's action code might not be doing any more what you intend it do do. ;-) (as it now would have to cope with ID and STR yytext's)
Isn't this good enough (note the yytext
edit and the adjusted actions,
now without eval()
):
%lex
%%
\s+ /* whitespace */
[a-zA-Z]+ return 'ID';
\"[^\"\n]*\" /* strip off the quotes: */ yytext = yytext.substr(1,
yytext.length - 2); return 'STR';
"(" return '(';
")" return ')';
"[" return '[';
"]" return ']';
"," return ',';
<<EOF>> return 'EOF';
/lex
%start value
%%
value
: id {$$ = $1;}
| '[' ']' {$$ = [];}
| '[' values ']' {$$ = $2;}
| id '(' ')' {$$ = [$1, []]; }
| id '(' values ')' {$$ = [$1, $3];}
;
id
: STR {$$ = $1;}
| ID {$$ = $1;}
;
values
: value {$$ = [$1];}
| values ',' value {$$ = $1; $$.push($3);}
;
Thanks for the quick response. I did not understand what the FIRST sets have to do with this. Having a state with those two items is reasonable for this grammar. Choosing which one to reduce should be a matter of lookahead symbols.
I tried the same grammar with another parser generator for Javascript: JS/CC. It generated a LALR parser table just fine. You can check it out with their online tool here. It shows the item sets with lookahead information, so you can see what i'm talking about.
This is the grammar written in their format.
! '[ \t\n\r]+' ;
'\('
'\)'
'\['
'\]'
','
'[a-zA-Z]+' ID
'"[^\n"]*"' STR [* %match = eval(%match); *]
;
##
value
: STR [* *]
| '[' ']' [* %% = []; *]
| '[' values ']' [* %% = %2; *]
| id '(' ')' [* %% = [%1, []]; *]
| id '(' values ')' [* %% = [%1, %3]; *]
;
id
: STR [* %% = %1; *]
| ID [* %% = %1; *]
;
values
: value [* %% = [%1]; *]
| values ',' value [* %% = %1; %1.push(%3); *]
;
The eval
call is just to remove the quotes and replace any escape sequences in the string literal. I agree it is not elegant, but this is not the language i'm trying to implement. It is just a small and quick example i made to show this issue.
Changing the corresponding definition in the actual language i have to parse is not an option for me. Still, SLR seems to be working well. I can also generate the parser with LR1, but the resulting table is much bigger. Nonetheless i'd like to understand why LALR is failing.
For the following grammar Jison (from the CLI) reports a reduce-reduce conflict when it should not. When switched to use SLR, the parser is generated without errores. Tried
-p lalr
and got the same error.The conflict goes like this:
And the debug output shows this for state 2:
I believe symbols; in
the same way that . AFAIK an SLR grammar must work fine with LALR. Am i right?
]
,,
,)
or the end of input cannot follow a derivation from(
cannot follow a derivation from