lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.77k stars 404 forks source link

Indented continuation #1268

Closed roveo closed 1 year ago

roveo commented 1 year ago

I'm trying to build a DSL that is indentation-aware, but also has some flexibility in terms of how the code is arranged. Here's an example:

$revenue = sum(Quantity * UnitPrice) ::float @invoice_items
  > This is a description
    that can span multiple lines
  format: currency
  key:
    nested: value

The relevant part of the grammar looks something like this:

metric: "$" ID "=" expr [ "::" ID ] [ "@" ID ] _NL [ _INDENT (description | kv)+ _DEDENT ]

This works, but expressions can get quite long, so I also want to allow the first "header" line of the definition to span multiple lines, as long as it's also indented.

So I want something like this to also be valid:

$revenue = sum(
    Quantity
    * UnitPrice
  ) ::float
  @invoice_items
  > This is a description
    that can span multiple lines
  format: currency
  key:
    nested: value

This means that the grammar should be whitespace-aware, but ignore whitespace in some contexts. How can I achieve this with lark?

MegaIng commented 1 year ago

As long as the ignored whitespaces are only inside parentheses, the default Indenter postlexer can deal with that by specifying open and close parentheses, see the python example.

If there is no simple rule like that, then you probably need to create a fully custom postlexer and/or lexer and maybe lark is not suited for this at all. This kind of context dependency is by definition no longer Context-Free which makes CFG parsers like lalr and earley unsuitable.