kach / nearley

📜🔜🌲 Simple, fast, powerful parser toolkit for JavaScript.
https://nearley.js.org
MIT License
3.57k stars 231 forks source link

An example showing significant whitespace? #643

Open Geordi7 opened 6 months ago

Geordi7 commented 6 months ago

I'm having difficulty creating a parser for a language like Pug, I haven't tried using an external lexer, but I have a sneaking suspicion it is necessary.

Can you provide an example which shows how to do it?

TekuConcept commented 3 months ago

significant whitespace

As in multiple contiguous whitespace characters?

OMS -> [\s]:* # optional multi-line whitespace
RMS -> [\s]:+ # required multi-line whitespace
Geordi7 commented 3 months ago

No, as in scopes delimited by indented sections of text (al a pug python haskell, etc.)

TekuConcept commented 3 months ago

as in scopes delimited by indented sections of text (al a pug python haskell, etc.)

Ah, so indent / dedent... that will be a context-aware parsing solution.

Use local state

You could get away with creating and updating a local context in the grammar post-processing step, eg.

LINES
    -> LINES RBS LINE {% d => {
        // where d[0] is the state object
        d => updateState(d)
    } %}
    |  LINE {% d => createState(d) %}

RBS -> OWS LF OMS # required break space
OMS -> [\s]:*     # optional multi-line space
OWS -> [ \t\r]:*  # optional white space
LF -> "\n"

This technique, however, will pose a few challenges and limitations, but it's one way to go about this without creating your own lexer.

Use a custom lexer

This may perhaps be the more trivial way of parsing indent / dedent - as your sneaking suspicion was hinting to. (Haven't tried it myself yet.) I found the following on moo's issue tracker for context-aware indent / dedent parsing: https://github.com/no-context/moo/issues/55 with the last link (moo-indentation-lexer) being the one you probably want.

Then according to the nearley docs:

@{%
    const moo = require("moo")
    const IndentationLexer = require('moo-indentation-lexer')

    // Create a lexer from rules
    const mooLexer = moo.compile({ ... })
    // Create an indentation-aware lexer using the lexer
    const lexer = new IndentationLexer({ lexer: mooLexer })
%}

# Pass your lexer object using the @lexer option:
@lexer lexer

BLOCK -> HEADING %indent STATEMENTS %dedent