lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.8k stars 409 forks source link

help wanted : is this grammar parsable with Lark ? #345

Closed cpainchaud closed 5 years ago

cpainchaud commented 5 years ago

Hi! I have tested various libraries to make a query/filter parser for a tool I am writing but they were either too complicated or could not handle some of the grammar I had in mind.

Here is sample queries that I would like to parse:

queries = [ "name matches test",  # most simple example
            "(name matches test)"  # like previous example but with parenthesis,
            "description contains tic and name matches toc",  # multiple filters
            'name matches test and (name matches hello or name matches toc) or name matches "hello there"',
            "name matches '(i am a regex)'", # some filters argument can be quoted because they contains spaces or forbidden characters like (
            "name matches '(i am a regex and need to escape this quote \'here\')'",  # sometimes you need to escape forbidden chars
            "(description contains this and (name matches that or name matches 'something else') ) or member.count = 1",  # nested-queries
            ]

Thank you in advance for any answer and/or help you can provide to help me implement the grammar with Lark !

erezsh commented 5 years ago

Yes, it's possible and shouldn't be too difficult.

I would start from the calculator example: https://github.com/lark-parser/lark/blob/master/examples/calc.py

And then change the operators into words and structure it in a way that makes sense for your needs.

Real-Gecko commented 5 years ago

Hi there! I have similar question, is it possible to turn something like this to Python dict? input:

version = 5
availableTag = -Archived?
availableTag = Type\Automated
filterText = 
isRestFilterTagsCollapsed = False
style = Simple
sorting
{
    sortingId = byName
    sortingData = 
    isReversed = False
}

To make it look like this: python dict:

{
    "version": 5,
    "availableTag": ["-Archived?", "Type\\Automated"],
    "filterText": "",
    "isRestFilterTagsCollapsed": "False",
    "style": "Simple",
    "sorting": {
        "sortingId": "byName",
        "sortingData": "",
        "isReversed": "False"
    }
}

In my case main problems are:

erezsh commented 5 years ago

Yes, it's possible, both in Earley and LALR.

See:

Real-Gecko commented 5 years ago

Hmm, I created grammar like this:

    start : _EOL* (section* | entry*)
    section_header : CNAME _EOL
    section : section_header entry+
    entry : key "=" value? _EOL
    key: CNAME
    value : /[^=\n]+/
    _EOL : " "* ( NEWLINE | /\f/)
    %import common.NEWLINE
    %import common.LETTER
    %import common.DIGIT
    %import common.WS_INLINE
    %import common.CNAME
    %ignore WS_INLINE

And it says:

lark.exceptions.UnexpectedCharacters: No terminal defined for '
' at line 7 col 8

sorting
       ^

Expecting: ['WS_INLINE', 'EQUAL']

What am I doing wrong?

erezsh commented 5 years ago

You didn't define the syntax for

sorting
{
    sortingId = byName
    sortingData = 
    isReversed = False
}

The examples I gave you are just the template. You can't expect them to fit your exact syntax, without any changes.

Real-Gecko commented 5 years ago

Actually I thought this part section_header : CNAME _EOL will catch sorting, looks like I was wrong :D