gristlabs / asttokens

Annotate Python AST trees with source text and token information
Apache License 2.0
172 stars 34 forks source link

Workaround Python tokenize bug with non-ASCII characters #82

Closed alexmojaki closed 2 years ago

alexmojaki commented 2 years ago

Workaround for https://github.com/python/cpython/issues/68382

I ran into this recently in futurecoder and implemented the core logic in https://github.com/alexmojaki/futurecoder/pull/373/commits/ea8bed30f9eaa795179b5beebe2e026ee53ee0e8#diff-2439d77444f0be435d92d3e5df78ab2f065592d4f66c075a1bc92e1b8bb41954R25-R46 to support variable names translated to Tamil. But monkeypatching tokenize globally leads to incorrect behaviour with actually invalid code, which for example causes friendly-traceback to explain syntax errors incorrectly. Besides, this fix belongs here.