zaach / jison-lex

generates lexical analyzers. used by jison.
57 stars 33 forks source link

Parse error when using arrow function in rules #23

Open Mutefish0 opened 6 years ago

Mutefish0 commented 6 years ago

It will parse error:

let grammar = {
    lex: {
        rules: [
            ['\\s+', ''],
            ['\\d+', () => 'NUMBER'],
            ['\\+', () => '+'],
            ['$', () => 'EOF'],
        ]
    },
    operators: [
        ['left', '+']
    ],
    bnf: {
        'es': [
            ['e EOF', 'return $1']
        ],
        'e': [
            ['e + e', '$$ = $1 + $3'],
            ['NUMBER', '$$ = Number(yytext)']
        ]
    }
}

while this is ok:

let grammar = {
    lex: {
        rules: [
            ['\\s+', ''],
            ['\\d+', function () { return 'NUMBER'}],
            ['\\+', function () { return '+' }],
            ['$',  function () { return 'EOF' }],
        ]
    },
    operators: [
        ['left', '+']
    ],
    bnf: {
        'es': [
            ['e EOF', 'return $1']
        ],
        'e': [
            ['e + e', '$$ = $1 + $3'],
            ['NUMBER', '$$ = Number(yytext)']
        ]
    }
}
GerHobbelt commented 6 years ago

The lexer generator (jison-lex) specifically looks for the return 'LABEL' pattern in the lexer rule action code blocks to replace the returned string with a token (number) when the lexer is combined with a grammar. This MAY be the cause of your trouble, though the parser run-time kernel has (IIRC) code to map token string to token number after the fact to cover any such lexer token return slip-ups before they enter the grammar parser proper.

While I say this, I wonder why that bit of code apparently doesn't kick in in your grammar/circumstances, so further diagnosis is required to answer this one without hand-waving like I do now.

GerHobbelt commented 6 years ago

Upon further diagnosis this turns up: the code generator specifically looks for the function () {...} pattern if the rule action is defined as a function instead of a string and therefor does not (yet) support Arrow Functions as in your example above.

Relevant code snippet in regexp-lexer, taken from the GerHobbelt/jison fork (TODO comment added today):

        newRules.push(m);
        if (typeof rule[1] === 'function') {
            // TODO: also cope with Arrow Functions (and inline those as well?) -- see also https://github.com/zaach/jison-lex/issues/23
            rule[1] = String(rule[1]).replace(/^\s*function\s*\(\)\s?\{/, '').replace(/\}\s*$/, '');
        }
        action = rule[1];
        action = action.replace(/return\s*'((?:\\'|[^']+)+)'/g, tokenNumberReplacement);
        action = action.replace(/return\s*"((?:\\"|[^"]+)+)"/g, tokenNumberReplacement);
GerHobbelt commented 6 years ago

FYI: this issue is now fixed in jison-gho (https://www.npmjs.com/package/jison-gho) since NPM build 0.6.1-211 i.e. 'build 211'.

(Your examples have been included as /examples/issue-lex-23*.js and altered versions for jison-lex specifically: '/packages/jison-lex/tests/spec/issue-23*.js`)