kach / nearley

📜🔜🌲 Simple, fast, powerful parser toolkit for JavaScript.
https://nearley.js.org
MIT License
3.57k stars 232 forks source link

Grammar Rules - Optional whitespace with moo.js #514

Closed LiamRiddell closed 4 years ago

LiamRiddell commented 4 years ago

Problem

I'm using the example arithmetic.ne grammar but trying to convert it to use moo lexer instead. I've started by swapping out the tokens for whitespace, float, integers.

Lexer and Nearley Input

Input: 100 + (10 * 100) * 20

 [1] =>  100            (int) 
 [1] =>                 (ws) 
 [1] =>  +              (addition) 
 [1] =>                 (ws) 
 [1] =>  (              (lparen) 
 [1] =>  10             (int) 
 [1] =>                 (ws) 
 [1] =>  *              (multiplication) 
 [1] =>                 (ws) 
 [1] =>  100            (int) 
 [1] =>  )              (rparen) 
 [1] =>                 (ws) 
 [1] =>  *              (multiplication) 
 [1] =>                 (ws) 
 [1] =>  20             (int) 

Arithmetic.ne (Modified)

@{%
    const moo = require("moo");

    const lexer = moo.compile({
        ws: /[ \t]+/,
        float: {
            match: /(?:^\+|\-?)(?:[1-9]\d{0,4}|0|)\.\d/,
            lineBreaks: true,
            value: (x) => parseFloat(x),
        },
        int: {
            match: /(?:[+-]?)(?:\d+)/,
            lineBreaks: true,
            value: (x) => parseInt(x),
        },
    });
%}

@lexer lexer

main -> %ws AS %ws {% function(d) {return d[1]; } %}

# PEMDAS!

# Parentheses
P -> "(" %ws AS %ws ")" {% function(d) {return d[2]; } %}
    | N             {% id %}

# Exponents
E -> P %ws "^" %ws E    {% function(d) {return Math.pow(d[0], d[4]); } %}
    | P             {% id %}

# Multiplication and division
MD -> MD %ws "*" %ws E  {% function(d) {return d[0]*d[4]; } %}
    | MD %ws "/" %ws E  {% function(d) {return d[0]/d[4]; } %}
    | E             {% id %}

# Addition and subtraction
AS -> AS %ws "+" %ws MD {% function(d) {return d[0]+d[4]; } %}
    | AS %ws "-" %ws MD {% function(d) {return d[0]-d[4]; } %}
    | MD            {% id %}

# A number or a function of a number
N -> %float          {% id %}
    | "sin" %ws P     {% function(d) {return Math.sin(d[2]); } %}
    | "cos" %ws P     {% function(d) {return Math.cos(d[2]); } %}
    | "tan" %ws P     {% function(d) {return Math.tan(d[2]); } %}

    | "asin" %ws P    {% function(d) {return Math.asin(d[2]); } %}
    | "acos" %ws P    {% function(d) {return Math.acos(d[2]); } %}
    | "atan" %ws P    {% function(d) {return Math.atan(d[2]); } %}

    | "pi"          {% function(d) {return Math.PI; } %}
    | "e"           {% function(d) {return Math.E; } %}
    | "sqrt" %ws P    {% function(d) {return Math.sqrt(d[2]); } %}
    | "ln" %ws P      {% function(d) {return Math.log(d[2]); }  %}

Parser Output

Error: Syntax error at line 1 col 1:

  100
  ^
Unexpected int token: 100. Instead, I was expecting to see one of the following:

A ws token based on:
    main →  ● %ws AS %ws

    at Parser.feed (nearley.js:317:27)
    at Object.<anonymous> (index.js:82:8)
    at Module._compile (internal/modules/cjs/loader.js:1158:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:1178:10)
    at Module.load (internal/modules/cjs/loader.js:1002:32)
    at Function.Module._load (internal/modules/cjs/loader.js:901:14)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:74:12)
    at internal/main/run_main_module.js:18:47 {
  offset: 0,
  token: {
    type: 'int',
    value: 100,
    text: '100',
    toString: [Function: tokenToString],
    offset: 0,
    lineBreaks: 0,
    line: 1,
    col: 1
  }
}

What I thought is the issue

Since I'm no longer using Nearley builtin's the _ or __ will no longer work. How would I implement optional whitespaces using moo in this example?

image

mcrawshaw commented 4 years ago

Try %ws:*

LiamRiddell commented 4 years ago

@mcrawshaw Awesome, that's worked. Thanks! Out of curiosity what does the EBNF modifier :* doing?

mcrawshaw commented 4 years ago

I believe its none or more. It's lightly documented here https://nearley.js.org/docs/grammar#more-syntax-tips-and-tricks. I guess the assumed default is only one.

LiamRiddell commented 4 years ago

@mcrawshaw - Thank you for the help 😄