gristlabs / asttokens

Annotate Python AST trees with source text and token information
Apache License 2.0
172 stars 34 forks source link

asttokens doesn't include the trailing paren in a call like `f({1: (2), 3: 4}, object())` #31

Closed ssbr closed 5 years ago

ssbr commented 5 years ago
>>> def printbad(text):
  astt = asttokens.ASTTokens(text, parse=True)
  n = astt.tree.body[0].value  # the Call node
  print(astt.text[n.first_token.startpos: n.last_token.endpos])
... 

>>> printbad('f({1: (2), 3: 4}, object())')  # bad
f({1: (2), 3: 4}, object()

>>> printbad('f({1: (2), 3: 4}, object)')    # good
f({1: (2), 3: 4}, object)

Something about having a dict literal and parentheses inside the dict literal confuses asttokens, so that it is wrong about the end of the call expression and grabs the wrong parenthesis token.

(I attempted to minimize further, but without much luck.)

ssbr commented 5 years ago

I tried debugging but couldn't follow what was going on, so instead I did the thing you aren't supposed to do and guessed what the problem was. It does make the problem go away, but I'm not actually sure if I'm right.

asttokens is iterating in the order given by ast.iter_child_nodes, which goes field by field and iterates over all members of that field. This means that all keys get yielded, before all values, instead of yielding them in source order. My guess is that this messes up the parenthesis tracking, because it sees 1, and then :(2), between consecutive nodes, and then 3. That's the keys. Then for values, it sees 2, ), 3: between nodes, and then 4. So it thinks that it's seen a closing parenthesis between the 2 and the 3 because it counted that span twice.

By iterating over the children in actual source order, the double counting of the ) is removed.

I haven't verified by actually understanding the code, this was just a guess on a lark, because I remembered that the dict child order was annoying (and it broke my own refactoring tool based on asttokens at one point).

I'll send a PR to fix the described problem, assuming that this is the source of the bug.

ssbr commented 5 years ago

The test case in that PR is exactly the example from this bug, so I'd call this fixed. :D