zaach / jison

Bison in JavaScript.
http://jison.org
4.34k stars 448 forks source link

[bug or feature?] default action `$$ = $1` is also executed when the rule has an explicit action block; this is not cf. bison documentation #343

Open GerHobbelt opened 7 years ago

GerHobbelt commented 7 years ago

From the bison docs @ http://dinosaur.compilertools.net/bison/bison_6.html#IDX85 (emphasis mine):

If you don't specify an action for a rule, Bison supplies a default: $$ = $1. Thus, the value of the first symbol in the rule becomes the value of the whole rule. Of course, the default rule is valid only if the two data types match. There is no meaningful default action for an empty rule; every empty rule must have an explicit action unless the rule's value does not matter.

Technically, jison doesn't do this: it always executes the default action $$ = $1 inside the parser kernel and then lets the user-defined action code overwrite the $$ by executing the "default action" before executing the rule's action code inside the generated performAction() function.

When the userland action code doesn't modify the $$ value, it will remain set to equal the $1 value, rather than making it undefined instead.

Example grammar snippet:

slist : slist stmt ';' { console.log("** not touching $$!\n"); }
      | stmt ';'       {
                         console.log("** neither am I!\n");
                       }
;

Suggested solution v1.0

Modify jison to inspect the grammar rules' action code blocks (which jison handles anyway while it places them inside performAction() and:

  1. replace the current kernel 'default action' code with $$ = undefined. WARNING! SEE SECTION FURTHER BELOW! THIS IS NOT A GOOD IDEA!
  2. add the default action code $$ = $1 to any non-empty rule which does not have any action at all
  3. add the default action code $$ = undefined to any empty rule, so that the behaviour is always predictable and $$ is set to a known value after each reduce action
  4. prefix any user-defined action code block which does not set $$ with the action code $$ = undefined. WARNING! SEE SECTION FURTHER BELOW! THIS IS NOT A GOOD IDEA!

Why the initial suggestion is not a smart idea to do

Quoting from the same bison section a bit more:

$n with n zero or negative is allowed for reference to tokens and groupings on the stack before those that match the current rule. This is a very risky practice, and to use it reliably you must be certain of the context in which the rule is applied. Here is a case in which you can use this reliably:

foo:       expr bar '+' expr  { ... }
         | expr bar '-' expr  { ... }
         ;

bar:       /* empty */
         { previous_expr = $0; }
         ;

As long as bar is used only in the fashion shown here, $0 always refers to the expr which precedes bar in the definition of foo.

As I want to support this 'obscure' feature too, the previous changeset would obliterate $0 because $$ === $0 due to the nature of the parse stack.

Still we need to change jison such that the default action $$ = $1 only happens when there is no user-defined action for the given rule at all; the suggested default action for all other situations, in order to produce a deterministic $$ behaviour on reduce, is to set $$ = undefined.

Besides, the changeset above would require intricate user code (action code chunk) analysis, which is no sine cure either, so we should come up with a changeset which does not need to analyze userland action code in depth.

Note

GerHobbelt/jison does perform some rough code analysis, but any mistakes made there are harmless as those impact kernel performance optimizations only: an incorrect code analysis only makes jison keep the slower (and safer) code.

Suggested solution v2.0

Which results in this set of behaviours on reduce:

  1. analyze which rules have user-defined action code chunks (we already know that as this knowledge is required for the generation of performAction() by jison)

  2. inside performAction cache $0 locally. (WARNING: we now will assume a environment where referencing $0 is not necessarily equivalent to $$! See next steps.)

  3. set $$ = undefined for all rules. User-defined action, iff it does any $$ assignment at all, will overwrite this value and we'll be fine, producing an always-deterministic $$ result this way, including in those occasions where the userland action code is subtly flawed, e.g.:

    slist : A B 
        { 
          if ($A) { $$ = $B } 
          // whoops! no assignment of `$$` when not($A):
          // coder must have forgotten to add the 'else' clause!
        } 
    ;
  4. use $$ = $1 as fill-in action code for those rules which don't have any action code and are non-empty, i.e. have a $1 to speak of.

    Because we've set $$ = undefined for all rules at the start, epsilon rules are already covered.

Notes