Closed lynnpepin closed 7 years ago
Did a bit of research; these few lines of code should be enough to nicely show off the tokenizer. Change "filename.py" to your own filename and check it out!
from tokenize import tokenize, untokenize, tok_name
pyfile = open("simpletest.py","rb")
tokens = list(tokenize(pyfile.readline))
pyfile.close()
for token in tokens:
print(token.start[0], tok_name[token.exact_type],token.string)
pass
Explanation: We open a file as "rb" because the tokenizer wants to read them as bytes. token.start (and token.end) returns a tuple of (linenumber, character), indicating where a token starts and ends. token.exact_type is an integer representation of a token, while toke_name maps that to a string. And token.string gives us the line of python code that resulted in the token in question! :)
EDIT: Forgot to mention, working on this right now
py2by fails on a lot of valid python indentation cases, including, but not limited to:
No way to express mixed indentation levels, which are valid in python. E.g.:
Indentation that does not impact nesting level. E.g.:
I'm considering rewriting py2by utilizing the python tokenizer to detect INDENT and DEDENT tokens. Should be more portable too, should the indentation rules for python ever change