gristlabs / asttokens

Annotate Python AST trees with source text and token information
Apache License 2.0
172 stars 34 forks source link

Surrounding parentheses in generator expressions #50

Closed alexmojaki closed 4 days ago

alexmojaki commented 4 years ago

In Python < 3.8, get_text doesn't include the surrounding parentheses in generator expressions. For example:

import ast

import asttokens

source = "list(x for x in y)"
atok = asttokens.ASTTokens(source, parse=True)
for node in ast.walk(atok.tree):
  if isinstance(node, ast.GeneratorExp):
    print(atok.get_text(node))

Prints:

x for x in y

In almost all cases (elif is the only other exception I know of), get_text returns valid Python which parses back to an equivalent AST. I think it's fair to say users generally expect this. The tests actually assert it, they only missed this because they eagerly add parentheses to handle multiline expressions before reparsing and comparing.

In Python 3.8 this is 'fixed', so there's currently a difference in behaviour between Python versions.

I think it makes somewhat more sense to always include the parentheses, but it's not obvious, especially as this will usually mean including parentheses that belong to a function call. But if there are users who would have a problem with a change, they already have to deal with that in Python 3.8.

None of this is actually a problem for me personally, I just noticed a difference in outputs across Python versions and investigated. For now it's just theoretically a problem for other people.

Note that I haven't made any behaviour changes in this PR yet, I'm just demonstrating the problem in the tests for now.

dsagal commented 4 years ago

Interesting. Yes, it does seem useful to include surrounding parens for generator expressions, to match Python 38 behavior. But it's a behavior change for older versions.

ssbr commented 4 years ago

Here's a reversed argument: normally, if you replace an expression with, say, a variable, it parses as one would expect. But here, if we replace (x for x in y) with y in that expression, we get listy, which is no good at all. Unless programs are careful with how they verify their substitutions, they will produce incorrect code.

No matter whether asttokens includes parens or does not include parens, somebody will be broken by this and need to add extra parentheses to make their code rewriting work as intended. Either parens need to be added when replacing genexps with non-genexps, or parens need to be added when copying genexps to a new non-call location.

Given that new parens often need to be added, like when replacing x with x+y in x*2, this kind of failure is not so strange.

(FWIW I handle all of this by verifying that syntax trees line up: if I am replacing $x with y in list($x), I verify that list(y) has the same shape as the pattern: a list call with one parameter. And I add parens until it has the same shape, or else abort with an exception. So this would not break me in a meaningful way, since my program would try listy, notice that it doesn't look right, and then try list(y). That's almost a coincidence, I had meant for the parentheses to be added for things like the operator precedence example. The change in 3.8 and this PR might break other people who don't have this kind of logic. Presumably it requires bumping the major version number again.)

To be clear, IMO including the parens improves asttokens: the 3.8 behavior should match the 3.7 behavior, and there's two ways to go about that. I probably prefer the other one, but maybe #11 points towards doing it this way.

alexmojaki commented 4 days ago

I think this is no longer relevant since it refers to a difference before and after 3.8, and before 3.8 is no longer supported.