kach / nearley

📜🔜🌲 Simple, fast, powerful parser toolkit for JavaScript.
https://nearley.js.org
MIT License
3.57k stars 231 forks source link

facing an error while parsing in case of newline token #646

Open Mohammad-y-abbass opened 3 months ago

Mohammad-y-abbass commented 3 months ago

i am creating a small programming language and i have set the grammar to handle statements like this: statements -> statement %NL {% (data) => { return [data[0]]; } %} | statements %NL statement %NL {% (data) => { return [...data[0], data[2]]; } %} but everytime is reaches the newline token it throws this error: Error while parsing Error: invalid syntax at line 1 col 19:

1 @name = "mohammad" ^ 2 call print(name) Unexpected input (lexer error). I did not expect any more input. Here is the state of my parse table:

expression → %string ●
var_assign → %var_declaration %identifier _ "=" _ expression ●
statement → var_assign ●

this is only in case of newline token everything else works fine

TekuConcept commented 3 months ago

One recommendation I would make is to not end (or begin) your definitions with "whitespace" pattern matches. This way you avoid parser ambiguities.

Change:

statements
-> statement %NL {% d => [d[0]] %}
|  statements %NL statement %NL {% d => [...d[0], d[2]] %}

Into:

statements
-> statement {% d => [d[0]] %}
|  statements %NL statement {% d => [...d[0], d[2]] %}

To handle "whitespace" surrounding single statements, define it in the parent definition. For example:

scope -> begin %NL statements %NL end

Here is a working example from my own grammar:

STATEMENT
    -> SINGLE_STATEMENT {% id %}
    |  MULTI_STATEMENT {% id %}
SINGLE_STATEMENT -> EXPRESSION {% d => [d[0]] %}
MULTI_STATEMENT
    -> "=" OMSC EXPRESSION {% d => [d[2]] %}
    |  "=" OMSC EXPRESSION RMSC MULTI_STATEMENT
        {% d => [d[2],...d[4]] %}