zaach / jison

Bison in JavaScript.
http://jison.org
4.34k stars 449 forks source link

Reduce-reduce conflict reported in LALR but not in SLR. #289

Open LeonardoVal opened 9 years ago

LeonardoVal commented 9 years ago

For the following grammar Jison (from the CLI) reports a reduce-reduce conflict when it should not. When switched to use SLR, the parser is generated without errores. Tried -p lalr and got the same error.

%lex

%%

\s+                /* whitespace */
[a-zA-Z]+           return 'ID';
\"[^\"\n]*\"        return 'STR';
"("                 return '(';
")"                 return ')';
"["                 return '[';
"]"                 return ']';
","                 return ',';
<<EOF>>             return 'EOF';

/lex

%start value

%%

value
    : STR   {$$ = eval($1);}
    | '[' ']' {$$ = [];}
    | '[' values ']' {$$ = $2;}
    | id '(' ')' {$$ = [$1, []]; }
    | id '(' values ')' {$$ = [$1, $3];}
;
id
    : STR {$$ = eval($1);}
    | ID {$$ = $1;}
;
values
    : value {$$ = [$1];}
    | values ',' value {$$ = $1; $$.push($3);}
;

The conflict goes like this:

Building parse table.
Conflict at state: 2, token: $end
  reduce by rule: id -> STR
  reduce by rule: value -> STR
Conflict at state: 2, token: (
  reduce by rule: id -> STR
  reduce by rule: value -> STR
Conflict at state: 2, token: ]
  reduce by rule: id -> STR
  reduce by rule: value -> STR
Conflict at state: 2, token: ,
  reduce by rule: id -> STR
  reduce by rule: value -> STR
Conflict at state: 2, token: )
  reduce by rule: id -> STR
  reduce by rule: value -> STR

5 Conflict(s) found in grammar.

And the debug output shows this for state 2:

item set 2 
value -> STR .
id -> STR . 
transitions ->  {}

I believe symbols ], ,, ) or the end of input cannot follow a derivation from ; in the same way that ( cannot follow a derivation from . AFAIK an SLR grammar must work fine with LALR. Am i right?

GerHobbelt commented 9 years ago

The grammar you posted is not LALR(1) as the first value choice STR is not exactly matching the other choices' FIRST set for id: {STR, ID}.

The quickest grammar fix to make it LALR(1) is this (only part of the grammar file shown: note the change in the first choice for the value rule):

%%

value
    : id   {$$ = eval($1);}
    | '[' ']' {$$ = [];}
    | '[' values ']' {$$ = $2;}
    | id '(' ')' {$$ = [$1, []]; }
    | id '(' values ')' {$$ = [$1, $3];}
;

but then of course you accept ID next to STR, thus slightly augmenting your 'language' as specified in the grammar, while the first choice's action code might not be doing any more what you intend it do do. ;-) (as it now would have to cope with ID and STR yytext's)

Off Topic: Why 'eval()`?

Isn't this good enough (note the yytext edit and the adjusted actions, now without eval()):

%lex

%%

\s+                /* whitespace */
[a-zA-Z]+           return 'ID';
\"[^\"\n]*\"        /* strip off the quotes: */ yytext = yytext.substr(1,
yytext.length - 2); return 'STR';
"("                 return '(';
")"                 return ')';
"["                 return '[';
"]"                 return ']';
","                 return ',';
<<EOF>>             return 'EOF';

/lex

%start value

%%

value
    : id   {$$ = $1;}
    | '[' ']' {$$ = [];}
    | '[' values ']' {$$ = $2;}
    | id '(' ')' {$$ = [$1, []]; }
    | id '(' values ')' {$$ = [$1, $3];}
;
id
    : STR {$$ = $1;}
    | ID {$$ = $1;}
;
values
    : value {$$ = [$1];}
    | values ',' value {$$ = $1; $$.push($3);}
;
LeonardoVal commented 9 years ago

Thanks for the quick response. I did not understand what the FIRST sets have to do with this. Having a state with those two items is reasonable for this grammar. Choosing which one to reduce should be a matter of lookahead symbols.

I tried the same grammar with another parser generator for Javascript: JS/CC. It generated a LALR parser table just fine. You can check it out with their online tool here. It shows the item sets with lookahead information, so you can see what i'm talking about.

This is the grammar written in their format.

!   '[ \t\n\r]+' ;

    '\(' 
    '\)'
    '\['
    '\]'
    ','
    '[a-zA-Z]+' ID
    '"[^\n"]*"' STR [* %match = eval(%match); *]
    ;
##
value
    : STR   [* *]
    | '[' ']' [* %% = []; *]
    | '[' values ']' [* %% = %2; *]
    | id '(' ')' [* %% = [%1, []]; *]
    | id '(' values ')' [* %% = [%1, %3]; *]
;
id
    : STR [* %% = %1; *]
    | ID  [* %% = %1; *]
;
values
    : value [* %% = [%1]; *]
    | values ',' value [* %% = %1; %1.push(%3); *]
;

The eval call is just to remove the quotes and replace any escape sequences in the string literal. I agree it is not elegant, but this is not the language i'm trying to implement. It is just a small and quick example i made to show this issue.

Changing the corresponding definition in the actual language i have to parse is not an option for me. Still, SLR seems to be working well. I can also generate the parser with LR1, but the resulting table is much bigger. Nonetheless i'd like to understand why LALR is failing.