haskell / happy

The Happy parser generator for Haskell
Other
286 stars 84 forks source link

Unused states #138

Open goldfirere opened 5 years ago

goldfirere commented 5 years ago

With the grammar file below, and running with -i, happy produces two unused states (1 and 2). I don't know enough about how happy parsers run to know if these slow the parser down, but it seems these states can be trimmed. Initially discovered by @sdepew97.


%expect    0
%name      parse
%tokentype { Token }
%error     { (error . show) }

%token
  NUMBER    { NUMBER $$ }
  '+'       { PLUS }
  '*'       { TIMES }
  '('       { LPAREN }
  ')'       { RPAREN }

%%

sum : sum '+' prod       { () }
    | prod               { () }

prod : prod '*' factor   { () }
     | factor            { () }

factor : '(' sum ')'     { () }
       | NUMBER          { () }
simonmar commented 5 years ago

Thanks for the report, I agree it looks suspicious. It's been a long time since I debugged anything in Happy and the code is quite inscrutable. I fear I wouldn't be able to stop until I had rewritten the whole thing. So I'll probably leave this for a rainy day. If anyone else wants to investigate, be my guest.

goldfirere commented 5 years ago

That's about what I expected as a response. Clearly, these states aren't actively causing harm (though they could, conceivably, be causing a bit of slowdown -- I have no idea). That said, it did seem worthwhile to document the odd behavior.

andreasabel commented 4 years ago

It is a bit of a pity that happy drops the provenance for some states in the .info file. This is a problem when trying to explain to my students how the .info file corresponds to the parse table as described in standard textbooks such as the dragon book.

The .info file for the grammar above contains:

...
Grammar
-----------------------------------------------------------------------------
    %start_parse -> sum                                (0)
    sum -> sum '+' prod                                (1)
    sum -> prod                                        (2)
    prod -> prod '*' factor                            (3)
    prod -> factor                                     (4)
    factor -> '(' sum ')'                              (5)
    factor -> NUMBER                                   (6)
...
State 0

    NUMBER         shift, and enter state 5
    '('            shift, and enter state 6

    sum            goto state 7
    prod           goto state 3
    factor         goto state 4

State 1

    NUMBER         shift, and enter state 5
    '('            shift, and enter state 6

    sum            goto state 2
    prod           goto state 3
    factor         goto state 4

State 2

    sum -> sum . '+' prod                               (rule 1)

    '+'            shift, and enter state 8

State 3

    sum -> prod .                                       (rule 2)
    prod -> prod . '*' factor                           (rule 3)
...

The first two states (0 and 1) have no explanation.

The initial state (0) should be

         %start_parse -> . sum

as witnessed by the goto action to state 7 upon pushing sum onto the stack.

State 1 (unused) could be

        sum -> . sum '+' prod

judging by the goto action to state 2 upon sum.

Besides 1, which state do you think is unused? (State 0 is not a target in the parse table, but it serves as initial state.)