lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.88k stars 414 forks source link

Missing end_line in example Python grammar #472

Closed robinsierra closed 4 years ago

robinsierra commented 5 years ago

When using the current Python 3 example grammar Lark doesn't generate end_line information at the end of a block, e.g., for the last statement in the body of a for loop. Example program:

for i in range(0):
    pass
    pass

If you change python_parser.py to

#
# This example demonstrates usage of the included Python grammars
#

import sys
from io import open

from lark import Lark
from lark.visitors import Transformer, v_args
from lark.indenter import Indenter

class PythonIndenter(Indenter):
    NL_type = '_NEWLINE'
    OPEN_PAREN_types = ['LPAR', 'LSQB', 'LBRACE']
    CLOSE_PAREN_types = ['RPAR', 'RSQB', 'RBRACE']
    INDENT_type = '_INDENT'
    DEDENT_type = '_DEDENT'
    tab_len = 8

kwargs = dict(postlex=PythonIndenter(), start='file_input',
              propagate_positions=True)
python_parser3 = Lark.open('python.lark', rel_to=__file__, parser='lalr',
                           **kwargs)

def _read(fn, *args):
    kwargs = {'encoding': 'iso-8859-1'}
    with open(fn, *args, **kwargs) as f:
        return f.read()

@v_args(meta=True)
class _PythonTransformer(Transformer):

    def for_stmt(self, children, meta):
        print(meta.end_line)
        return children

    def pass_stmt(self, children, meta):
        print(meta.end_line)

def parse(text):
    tree = python_parser3.parse(text)
    return _PythonTransformer().transform(tree)

if __name__ == '__main__':
    tree = parse(_read(sys.argv[1]) + '\n')

and pass to it the python file mentioned before it prints

3
None
None

instead of the correct line numbers, even though propagate_positions is True.

robinsierra commented 4 years ago

The problem seems to be in this part of the grammar:

?stmt: simple_stmt | compound_stmt
?simple_stmt: small_stmt (";" small_stmt)* [";"] _NEWLINE
?small_stmt: (expr_stmt | del_stmt | pass_stmt | flow_stmt | import_stmt | global_stmt | nonlocal_stmt | assert_stmt)

When printing the end_line information in parse_tree_builder.py when it gets assigned shows that the end_line information of the token _NEWLINE is None, but it is still assigned. Changing the loop

            for c in reversed(children):
                if isinstance(c, Tree) and c.children and not c.meta.empty:
                    res.meta.end_line = c.meta.end_line
                    res.meta.end_column = c.meta.end_column
                    res.meta.end_pos = c.meta.end_pos
                    res.meta.empty = False
                    break
                elif isinstance(c, Token) and not c.end_line is None:  # This is new
                    res.meta.end_line = c.end_line
                    res.meta.end_column = c.end_column
                    res.meta.end_pos = c.pos_in_stream + len(c.value)
                    res.meta.empty = False
                    break

fixes the problem, as it then leaves out tokens where the end_line information is None.

erezsh commented 4 years ago

Thanks for reporting it. The latest commit to master should solve the issue. (the problem was in the lexer all along)

robinsierra commented 4 years ago

It's working for me. Thanks a lot for fixing it!

erezsh commented 4 years ago

You're welcome