musiKk / plyj

A Java parser written in Python using PLY.
Other
150 stars 69 forks source link

Add line number support to all AST nodes #47

Closed bbbert closed 3 years ago

bbbert commented 8 years ago

Hello!

Thanks for putting this project up, it's really useful to be able to parse Java directly from Python code. This PR came about because I needed the ability to extract the line number of certain AST nodes.

The parser obtains the current line number from its reference to the lexer. However, each production is only called when it is parsed completely, and we want the line number at which the language constructs begin. Also, tokens do not get annotated with their line number, so we can't just grab the line number of the token.

To get around this, I modified some parts of the grammar to replace some tokens with single-token symbols, so that the token's line number can be propagated up to the AST node it belongs to. Ideally, all tokens should be parsed into their own symbol (eg. throw_sym : THROW) so that every symbol has a line number associated with it, but I feel like that would have a large impact on performance, so I only did that wherever it was needed.

I hope this is useful. Feel free to ping me if there are places where the line number still doesn't make sense, I might have missed some.