gristlabs / asttokens

Annotate Python AST trees with source text and token information
Apache License 2.0
172 stars 34 forks source link

ASTTokens won't parse strings with coding declarations in Python 2 #21

Closed ssbr closed 5 years ago

ssbr commented 6 years ago

Python 2:

>>> asttokens.ASTTokens("# coding: ascii\n1", parse=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/google/home/devinj/.local/lib/python2.7/site-packages/asttokens/asttokens.py", line 50, in __init__
    self._tree = ast.parse(source_text, filename) if parse else tree
  File "/usr/lib/python2.7/ast.py", line 37, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 0
SyntaxError: encoding declaration in Unicode string

In Python 3 it works just fine.

I expect failures if the string is not UTF-8 (e.g. if it had coding:latin1 and some invalid utf-8 in there), but I had somewhat hoped that it would also just work to pass in the raw bytes.

I think the easiest workaround for end users is to parse it manually by hand:

>>> s = "# coding: ascii\n1"
>>> asttokens.ASTTokens(s, tree=ast.parse(s))
<asttokens.asttokens.ASTTokens object at 0x7f35d3b76550>

asttokens could also do that internally itself -- parse the tree before decoding, instead of after.

ssbr commented 6 years ago

I'm not very good at git/github (use a different VCS at work), so I'm not sure how everything works, but I made a PR with my suggested fix: https://github.com/gristlabs/asttokens/pull/22

alexmojaki commented 5 years ago

This is fixed by #22 now, right?

ssbr commented 5 years ago

Yes, I believe so. (Sorry about that; not used to github issues etc., I don't use github as much at work.)