we-like-parsers / pegen

PEG parser generator for Python
https://we-like-parsers.github.io/pegen/
MIT License
150 stars 32 forks source link

Fix empty last line returned by tokenizer #77

Closed shumbo closed 1 year ago

shumbo commented 1 year ago

Fixes #76

This PR fixes an issue where tokenizer returns the empty string for the last line if no NEWLINE exists at the end of the source.

Python's tokenize module injects a NEWLINE if the source does not end with one. The injected token has the empty string for its line but Pegen's tokenizer remembers that empty string as the content of the last line of the source.

Since source does not change over tokenization, once self._lines[i] is set for some i, it does not need to be set again with later tokens. I added a check to only update self._lines only if tok.start[0] is not set. With this change, NEWLINE injected by the tokenize module will not override self._lines with the empty string.

I also added a test case for this change.

shumbo commented 1 year ago

@MatthieuDartiailh Updated the changelog in 487ebe07d7a02b4c450806f9217125ac11ddd083