zaach / jison

Bison in JavaScript.
http://jison.org
4.34k stars 449 forks source link

A bug when reduce #212

Open fool2fish opened 10 years ago

fool2fish commented 10 years ago

I encountered this problem when write a velocity template parser. Bellow is a simplified code:

1. create file: my.yy

%start root

%%

root
  : list
  | range
  ;

list
  : '[' listItems ']'
  ;

range
  : '[' rangeItem '..' rangeItem ']'
  ;

listItems
  : listItem
  | listItem ',' listItems
  ;

listItem
  : integer
  ;

rangeItem
  : integer
  ;

2. compile it

jison my.yy

then get some conflict error:

Conflict in grammar: multiple actions possible when lookahead token is .. in state 8
- reduce by rule: listItem -> integer
- reduce by rule: rangeItem -> integer
Conflict in grammar: multiple actions possible when lookahead token is ] in state 8
- reduce by rule: listItem -> integer
- reduce by rule: rangeItem -> integer
Conflict in grammar: multiple actions possible when lookahead token is , in state 8
- reduce by rule: listItem -> integer
- reduce by rule: rangeItem -> integer

States with conflicts:
State 8
  rangeItem -> integer . #lookaheads= .. ] ,
  listItem -> integer . #lookaheads= .. ] ,

I draw a LR(1) automaton by hand, it seems should not be conflicting.

list-range

It is obvious that should reduce integer to rangeItem when lookahead is '..'

zaach commented 10 years ago

Was this in LR(1) mode or LALR, which is the default?

fool2fish commented 10 years ago

Do you mean jison has one option to choose use LR(1) or LALR?

But the language can be processed by LR(1) also can be processed by LALR, no matter which method jison uses, it may get the same result.

zaach commented 10 years ago

@fool2fish A language can be in LR(1) and not in LALR. You can change the mode jison is in using an option. E.g., for LR(1) mode:

jison -p lr mygrammar.y
fool2fish commented 10 years ago

@zaach You are right, they are really not the same thing, but I still don't think there should be a conflict when use LALR. I have updated the top floor, from the complete LR automaton (you can click it to see a big one), there is not any status has same core with other, so it is also the LALR automaton. Is there anything I was not aware of?

thmour commented 10 years ago

This is not a bug, this is a grammar mistake that lies in the below rules

listItem
  : integer
  ;

rangeItem
  : integer
  ;

listItem and rangeItem are the same thing, it looks nice for grammar readability, but to keep parsers fast we must follow a subset of what can be done with grammars.

Lalr(1) has 1 lookahead token. It sees '[' then it checks the next possible tokens, which are listItem and rangeItem, but they are the same, so if an integer comes it doesn't know which rule to follow. Parsers to be fast must make O(1) steps from one token to another following the rules resulting to O(n) compile time having n tokens (eg 7 tokens with input text "[2,4,6]"). Because you have 2 rules with the same content the parser must take one to be fast, taking a random rule is not good at all and checking possible rules one after another is slow. So the parser returns you an error and expects you to write a better grammar for its mode (in this case lalr(1)).

To remove the conflict merge listItem and rangeItem rules into Item because they are the same thing and use this grammar

%start root

%%

root
  : list
  | range
  ;

list
  : '[' listItems ']'
  ;

range
  : '[' Item '..' Item ']'
  ;

listItems
  : Item
  | listItems ',' Item
  ;

Item
  : integer
  ;

I think the issue can be closed now