This PR fixes an issue where tokenizer returns the empty string for the last line if no NEWLINE exists at the end of the source.
Python's tokenize module injects a NEWLINE if the source does not end with one. The injected token has the empty string for its line but Pegen's tokenizer remembers that empty string as the content of the last line of the source.
Since source does not change over tokenization, once self._lines[i] is set for some i, it does not need to be set again with later tokens. I added a check to only update self._lines only if tok.start[0] is not set. With this change, NEWLINE injected by the tokenize module will not override self._lines with the empty string.
Fixes #76
This PR fixes an issue where
tokenizer
returns the empty string for the last line if no NEWLINE exists at the end of the source.Python's
tokenize
module injects a NEWLINE if the source does not end with one. The injected token has the empty string for itsline
but Pegen's tokenizer remembers that empty string as the content of the last line of the source.Since source does not change over tokenization, once
self._lines[i]
is set for somei
, it does not need to be set again with later tokens. I added a check to only updateself._lines
only iftok.start[0]
is not set. With this change,NEWLINE
injected by thetokenize
module will not overrideself._lines
with the empty string.I also added a test case for this change.