Line number with comments and propagate_positions, lalr

lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

MIT License

4.86k stars 413 forks source link

Line number with comments and propagate_positions, lalr #211

Closed heshiming closed 6 years ago

heshiming commented 6 years ago

When propagate_positions=True, the result tree contains line and column number. However, they count only 'actual statements', ignoring things like comments and blank lines specified via %ignore.

I would like to pinpoint the location to the end user, for this to happen I need the line number to include those ignored statements, not just the actual ones.

With 'lalr' parser, is this possible in the current implementation?

erezsh commented 6 years ago

If I understand you correctly, you want a way to access the ignored tokens?

That is possible using lexer_callbacks. Here's an example of it in use: https://github.com/geographika/mappyfile/blob/master/mappyfile/parser.py#L31

The ignored tokens will have the correct line and column. Matching these tokens to the correct spot in the tree is possible, but isn't a trivial effort.

heshiming commented 6 years ago

Thanks. But what token should I use in lexer_callbacks?

I'm working with a Python like grammar. Just like the example, I have a 'COMMENT' token and '_NEWLINE' which includes 'COMMENT'. It seems that 'COMMENT' token will never be triggered if included in lexer_callbacks. If I try to include '_NEWLINE', I get an exception like the following:

File "/usr/local/lib/python3.6/site-packages/lark/lark.py", line 223, in parse
  return self.parser.parse(text)
File "/usr/local/lib/python3.6/site-packages/lark/parser_frontends.py", line 38, in parse
  return self.parser.parse(token_stream, *[sps] if sps is not NotImplemented else [])
File "/usr/local/lib/python3.6/site-packages/lark/parsers/lalr_parser.py", line 68, in parse
  for i, token in enumerate(stream):
File "/usr/local/lib/python3.6/site-packages/lark/indenter.py", line 32, in process
  if token.type == self.NL_type:
AttributeError: 'NoneType' object has no attribute 'type'

erezsh commented 6 years ago

The lexer_callbacks interface is:

callback( token ) -> token

You're getting this error because you're returning None for _NL. It isn't an issue for comments, because unlike newlines, they are ignored by the lexer.

heshiming commented 6 years ago

Now I can get the correct line number from token.line. Just as you said, I'm not seeing a trivial method to map it to the tree. It looks like keeping my own copy of the code without the comment is an easier approach. But thank you very much for everything you did in this project.

erezsh commented 6 years ago

I'm glad you like Lark!

Yes, it's not trivial to write. But, I know it's possible, because I've done it before. That's exactly what this function does: https://github.com/geographika/mappyfile/blob/master/mappyfile/parser.py#L79 (assign_comments).

It's not the most simple piece of code, but it's been tested to work! Perhaps you can adapt it for your purposes.

erezsh commented 6 years ago

Add docs under https://lark-parser.readthedocs.io/en/latest/recipes/