zaach / jison

Bison in JavaScript.
http://jison.org
4.36k stars 450 forks source link

Improvements in the conflict reporting messages #46

Closed ghost closed 13 years ago

ghost commented 13 years ago

Hi,

I've now got used to where conflict errors are likely to happen, but I find the conflict error messages rather opaque at times: e.g.

Conflict in grammar (state:99, token:catch)
   reduce by CATCHES -> 
   shift 230

The thing that can sometimes be opaque, is the state number. It would seem more useful to me to have a message that gave say the name of the pseudo-token that is being checked, perhaps with the number of the alternative. I realise that this may well add extra code to the resulting parser, it only needs to happen at build-time, and I think would make it much easier to find the sources of errors for some people.

Also what do the 'shift' messages mean?

Thanks,

Marcus.

zaach commented 13 years ago

The conflict message is saying that there is a shift-reduce in state 99, when the lookahead token is catch. It could either reduce by the rule CATCHES or it can shift the catch token and go to state 230. The state information comes in handy for debugging according to the parse table. You can get a good visualization of the parse table here. Use LALR(1) as the parser type.

I'll leave this open because we do need better debugging output. You can pass the -t option currently and see the LR(0) parser states, which is somewhat helpful.

ghost commented 13 years ago

Thanks. Yes, adding the -t flag certainly provides a lot of useful info, however it might just be too much for general usage.

What I think would be good is if there is an error, along with the state number, a written representation of that state is automatically written out (like in the 'productions' section of http://zaach.github.com/jison/try/usf/). That info should be written for both the state that it's in and the state that it would shift to. I think in general, that would be enough for most debugging needs, and is much more convenient than getting that information from the parser states table.

zaach commented 13 years ago

I've added some reporting about the states to the output by default.

ghost commented 13 years ago

Cool, thanks. I'll check them out soon.

ghost commented 13 years ago

Yes, the information you'd added is much clearer. Thanks.

ghost commented 13 years ago

Would it be possible to add the name of the rule that the 'state' comes under, possibly with an index of the problem section(s)? I think that would make it quicker for debugging.

e.g. if there is a problem with one (or both) of the following tests

TEST
  : (some stuff here)
  | (some other stuff here)
  ;

then you report back somewhere that it is under test 'TEST' and test number 1|2 (or both).

zaach commented 13 years ago

I'm not sure what you mean. The output states which rules were involved in the conflict, e.g.:

Conflict in grammar: multiple actions possible when lookahead token is { in state 47
- reduce by rule: action_body -> action_body { action_body } action_body
- shift token (then go to state 44)

In this case, action_body is the rule.

ghost commented 13 years ago

Well, I have quite a long jison script that has a conflict issue caused by the line EXPR_KEYWORD... in the section below:

TABLE_ELT
    : E
        {$$ = [$1];}
    | TABLE_KEY TABLE_SEP E
        {$$ = [$1, $3];}
    | ID TABLE_SEP E
        {$$ = [$1, $3];}
    | STATEMENT_KEYWORD TABLE_SEP E
        {$$ = [$1, $3];}
    | EXPR_KEYWORD TABLE_SEP E
        {$$ = [$1, $3];}
    | BASIC_VAL TABLE_SEP E
        {$$ = [$1, $3];}
    | '[' E ']' TABLE_SEP E
        {$$ = [$2, $5];}
;

The error messages I get (which are long) are like:

Conflict in grammar: multiple actions possible when lookahead token is : in state 246
- reduce by rule: CORE_VAL -> false
- reduce by rule: EXPR_KEYWORD -> false
Conflict in grammar: multiple actions possible when lookahead token is = in state 246
- reduce by rule: CORE_VAL -> false
- reduce by rule: EXPR_KEYWORD -> false

Knowing that there is a problem with CORE_VAL etc doesn't help me to find the source of the problem, which is actually in TABLE_ELT.

Is there any way of being able to identify the 'real' sources of the problem? What I'd like is for it to tell me that there's a problem stemming from TABLE_ELT, or at least give me a list of the possible causes.

zaach commented 13 years ago

Unfortunately, there's no easy way to determine the 'real' source here. I'd advise incrementally building your grammar so it's easier to tell when a change introduces a conflict, or explosion of conflicts.

ghost commented 13 years ago

That's what I thought.

Yes, I am indeed incrementally building the grammar, and so it's usually not too hard to figure out where the problems are.

Thanks for the quick responses.