ldevyataykina / ldsl_grammar

0 stars 0 forks source link

Add indentations to conditions in grammar using Lark #1

Open ldevyataykina opened 6 years ago

ldevyataykina commented 6 years ago

I've written grammar

`grammar = r"""  ?start: value
            ?value: dict
                  | if
                  | array
                  | STRING
                  | VARIABLE_NAME
                  | operator
                  | action_operator
                  | SIGNED_NUMBER      -> number
                  | "true"             -> true
                  | "false"            -> false
                  | "null"             -> null

            if: "если" condition "тогда:" suite ("в случае" condition ":" suite)* ["иначе:" else]

            condition: variable (action_operator variable)*
            variable: VARIABLE_NAME | SIGNED_NUMBER
            ?simple_stmt: STRING _NEWLINE
            ?stmt: simple_stmt | if
            else: STRING _NEWLINE

            suite: simple_stmt | _NEWLINE _INDENT stmt+ _DEDENT

            array: "[" [value ("," value)*] "]"
            dict: "{" [pair ("," pair)*] "}"
            pair: STRING ":" value

            ?operator: OPERATOR
            OPERATOR: "+:"|":"

            !action_operator: "<"|">"|"="|"=="|">="|"<="|"!="|"in"

            COMMENT: /#[^\n]*/
            _NEWLINE: ( /\r?\n[\t ]*/ | COMMENT )+
            _INDENT: /[\t]+/

            VARIABLE_NAME: /[\wа-яА-Я][\wа-яА-Я0-9_.-]+/
            STRING: /[\wа-яА-Я0-9_.-]+/

            %import common.ESCAPED_STRING
            %import common.SIGNED_NUMBER
            %import common.WS
            %ignore WS

            _DEDENT: "<DEDENT>" """`

and I try to execute next code:

`если x >= 18 тогда:
         success
 в случае 16 <= x < 18:
         если y > 100000 тогда:
                 get_more_info
 иначе: fail`

but it returns ParseError: Unexpected end of input! Expecting a terminal of: ['_NEWLINE', '_DEDENT', '__ANONSTR_3', '__ANONSTR_3', '__ANONSTR_3', 'STRING', '__ANONSTR_3']

I need to add correct indentations to code to get tree. how can I fix that to read this text? @erezsh , can you help with this question?

erezsh commented 6 years ago

How are you instantiating the parser?

ldevyataykina commented 6 years ago

@erezsh I use pip or better way clone directory?

erezsh commented 6 years ago

No, I mean, how are you calling the Lark object.

See these examples of how to parse indentation: https://github.com/erezsh/lark/blob/master/examples/indented_tree.py https://github.com/erezsh/lark/blob/master/examples/python_parser.py

ldevyataykina commented 6 years ago

@erezsh I use json_parser = Lark(grammar, parser='lalr', postlex=PythonIndenter(), start='value') and it returns me an error

erezsh commented 6 years ago

I can't reproduce the error you're getting, but try always adding a newline at the end of the text:

parser.parse(text + '\n')

If that doesn't work, try to reduce the grammar and input-text into a simpler grammar and text that still produces the same error.

ldevyataykina commented 6 years ago

@erezsh this error appear when json_parser = Lark(grammar, start='value') tree = json_parser.parse(file)

But with json_parser = Lark(grammar, parser='lalr', postlex=PythonIndenter(), start='value')
tree = json_parser.parse(file + '\n')

it returns UnexpectedToken: Unexpected token Token(STRING, 'x') at line 1, column 5. Expected: dict_keys(['condition', 'VARIABLE_NAME', 'variable', 'SIGNED_NUMBER']) Context: <no context>

erezsh commented 6 years ago

The reason for this error is that your terminal for STRING matches the same text as VARIABLE_NAME.

I assume you meant to put quotes around your string, like:

STRING: /"[\wа-яА-Я0-9_.-]+"/
ldevyataykina commented 6 years ago

@erezsh unfortunately, it doesn't help. But it's more priority to solve problem with indentation. Can you say, where is in my grammar error, connected with it?

erezsh commented 6 years ago

Why did you decide that the problem is with indentation?

erezsh commented 6 years ago

"try to reduce the grammar and input-text into a simpler grammar and text that still produces the same error." This is still my advice.

ldevyataykina commented 6 years ago

@erezsh I have working grammar, but after trying to add correct indentation to input code, I get an error https://github.com/ldevyataykina/ldsl_grammar/blob/master/1_ver.ipynb

erezsh commented 6 years ago

It looks like you have a working grammar with the Earley algorithm, which is a lot more forgiving than LALR. Unfortunately, there is no simple way (that I'm aware of) to make Earley support indentation (since it's not context-free).

If you can make this work, even without indentation:

Lark(grammar, parser='lalr', start='value')

Then it will be much easier for you to add indentation later.

ldevyataykina commented 6 years ago

@erezsh with argument parser='lalr' doesn't work UnexpectedToken: Unexpected token Token(STRING, 'возраст') at line 1, column 5. Expected: dict_keys(['__ANONSTR_6', '$END', '__RSQB', '__RBRACE', '__ANONSTR_5', '__COMMA', '__ANONSTR_7', 'condition', 'VARIABLE']) Context: <no context> why it expects dict_keys?

erezsh commented 6 years ago

It doesn't expect "dict_keys", it expects one of the terminals listed there. I already told you why this happens. "The reason for this error is that your terminal for STRING matches the same text as VARIABLE."