python / cpython

The Python programming language
https://www.python.org
Other
63.14k stars 30.23k forks source link

yield expression inside generator expression does nothing #54753

Closed dd5ce2ff-95a1-4e28-ace2-6d841dd59913 closed 6 years ago

dd5ce2ff-95a1-4e28-ace2-6d841dd59913 commented 13 years ago
BPO 10544
Nosy @gvanrossum, @birkenfeld, @rhettinger, @ncoghlan, @abalkin, @benjaminp, @glyph, @serhiy-storchaka, @1st1, @ajdavis, @DimitrisJim
PRs
  • python/cpython#4564
  • python/cpython#4579
  • python/cpython#4676
  • Files
  • yield-in-comprehensions.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['interpreter-core', 'type-bug', '3.7'] title = 'yield expression inside generator expression does nothing' updated_at = user = 'https://bugs.python.org/InyeolLee' ``` bugs.python.org fields: ```python activity = actor = 'serhiy.storchaka' assignee = 'none' closed = True closed_date = closer = 'serhiy.storchaka' components = ['Interpreter Core'] creation = creator = 'Inyeol.Lee' dependencies = [] files = ['47296'] hgrepos = [] issue_num = 10544 keywords = ['patch'] message_count = 66.0 messages = ['122475', '122512', '122645', '122658', '122705', '122759', '122760', '122762', '122765', '123641', '123649', '241410', '286273', '286274', '286280', '286281', '286282', '286283', '286407', '286408', '286409', '306784', '306802', '306804', '306805', '306806', '306807', '306808', '306810', '306812', '306829', '306831', '306832', '306833', '306834', '306835', '306836', '306837', '306838', '306840', '306841', '306842', '306843', '306844', '306845', '306846', '306847', '306939', '306940', '306963', '306964', '306983', '306995', '307013', '307033', '307035', '307039', '307051', '307067', '307091', '307095', '307359', '307361', '307426', '307449', '311593'] nosy_count = 15.0 nosy_names = ['gvanrossum', 'georg.brandl', 'rhettinger', 'ncoghlan', 'belopolsky', 'benjamin.peterson', 'erickt', 'glyph', 'Inyeol.Lee', 'serhiy.storchaka', 'yselivanov', 'esc24', 'danielsh', 'emptysquare', 'Jim Fasarakis-Hilliard'] pr_nums = ['4564', '4579', '4676'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue10544' versions = ['Python 3.7'] ```

    dd5ce2ff-95a1-4e28-ace2-6d841dd59913 commented 13 years ago

    Simple coroutine with for loop works:

    >>> def pack_a():
            while True:
                L = []
                for i in range(2):
                    L.append((yield))
                print(L)
    
    >>> pa = pack_a()
    >>> next(pa)
    >>> pa.send(1)
    >>> pa.send(2)
    [1, 2]
    >>>

    If using list comprehension (generator expression), it fails:

    >>> def pack_b():
            while True:
                L = [(yield) for i in range(2)]
                print(L)
    
    >>> pb = pack_b()
    <endless loop here>

    I understand what's going on here - generator expression is converted to nested function and there's no way to either stop the execution inside nested function (since it's not started yet!) or send() a value to its yield expression. Still I think this behavior is a bug and needs fixed.

    rhettinger commented 13 years ago

    Hmm, what an interesting and unexpected side-effect of the efforts to hide the loop induction variable.

    birkenfeld commented 13 years ago

    While the behavior is confusing, I don't think yield inside comprehensions should be disallowed. Rather, the fact that comprehensions have their own scope should be stated clearer.

    benjaminp commented 13 years ago

    I think I can probably fix it, but it's debatable whether it should be done, since it'd make list comps more of "quasi" functions.

    rhettinger commented 13 years ago

    This discussion should probably be moved to python-dev. With tools like Twisted's inlineDefer or the Monocle package, there is a growing need to be able to use yield in complex expressions. Yet, that goes against the trend toward making lists comps more like genexps and less like sugar for a simple for-loop accumulator.

    Guido, do you have any thoughts on the subject? Mark it a documentation issue or try out Benjamin's fix?

    gvanrossum commented 13 years ago

    I think it is definitely wrong the way it works in 3.x. (Especially since it works as expected in 2.x.)

    I agree with Inyeol's preference of fixes: (1) make it work properly for listcomps as well as genexps, (2) if that's not possible, forbid yield in a genexp or listcomp.

    Note that even though yield in a genexp could be considered as having a well-defined meaning, that meaning is not useful and I would consider it as merely a coincidence of the specification, not an intentional effect. So I would be fine changing its meaning. (My assumption is that since it is not useful there is -- almost -- no code depending on that meaning.)

    gvanrossum commented 13 years ago

    PS. Wasn't there a similar issue with something inside a genexp that raises StopIteration? Did we ever solve that?

    abalkin commented 13 years ago

    Isn't this the same issue as bpo-3267?

    gvanrossum commented 13 years ago

    Yes it is, but I was never asked about it back then.

    terryjreedy commented 13 years ago

    bpo-3267 did not expose endless loop possibility and was closed as won't fix. Rather than reopen that and close this and move nosy list back, I added to nosy list here.

    birkenfeld commented 13 years ago

    FWIW, the "endless loop possibility" is not of issue here, and is simply an artifact of the specific generator function the OP uses.

    ilevkivskyi commented 9 years ago

    I would like to add that since the introduction of asyncio module that heavily uses "yield from" syntax, binding of yield inside comprehensions/generator expressions could lead to unexpected results/confusing behavior. See for example this question on SO: http://stackoverflow.com/questions/29334054/why-am-i-getting-different-results-when-using-a-list-comprehension-with-coroutin

    b62e5afe-fdc6-42de-985a-faeb74e5c5a6 commented 7 years ago

    Is the fact that 'await' produces a syntax error in this context the same bug or a new one?

    ilevkivskyi commented 7 years ago

    Is the fact that 'await' produces a syntax error in this context the same bug or a new one?

    What kind of SyntaxError? await outside an async function is prohibited, bare await is also prohibited.

    b62e5afe-fdc6-42de-985a-faeb74e5c5a6 commented 7 years ago
    >>> async def foo():
    ...     bar = [await x for x in range(10)]
      File "<input>", line 2
    SyntaxError: 'await' expressions in comprehensions are not supported
    ilevkivskyi commented 7 years ago

    Python 3.5 does not support this, you should use Python 3.6 (plus await x will fail when you will run the coroutine, since you cannot await on int).

    b62e5afe-fdc6-42de-985a-faeb74e5c5a6 commented 7 years ago

    OK, cool. So, long term, there will be a way to do this (suspend within a generator expression). Thanks for the pointer.

    b62e5afe-fdc6-42de-985a-faeb74e5c5a6 commented 7 years ago

    (As far as awaiting on int, yes, I know how await works, I was focusing on the syntax.)

    7aa6e20b-8983-474f-b2ae-de7eff1caa04 commented 7 years ago

    Just to add my comment to this 7-years-old never-resolved issue: in PyPy 3.5, which behaves like Python 3.x in this respect, I made the following constructions give a warning.

        def wrong_listcomp():
            return [(yield 42) for i in j]
        def wrong_gencomp():
            return ((yield 42) for i in j)
        def wrong_dictcomp():
            return {(yield 42):2 for i in j}
        def wrong_setcomp():
            return {(yield 42) for i in j}

    SyntaxWarning: 'yield' inside a list or generator comprehension behaves unexpectedly (http://bugs.python.org/issue10544)

    The motivation is that none of the constructions above gives the "expected" result. In more details:

    For completeness, I think there is no problem with "await" instead of "yield" in Python 3.6.

    How about fixing CPython to raise SyntaxWarning or even SyntaxError?

    ilevkivskyi commented 7 years ago

    How about fixing CPython to raise SyntaxWarning or even SyntaxError?

    I think it is better to just fix the issue, i.e. make comprehensions be equivalent to for-loops even if they contain yield. (In particular this will lead to [(yield i) for i in range(5)] be SyntaxError outside function).

    The example of await shows that it is possible without leaking the loop variable into enclosing scope.

    7aa6e20b-8983-474f-b2ae-de7eff1caa04 commented 7 years ago

    Let's see if the discussion goes anywhere or if this issue remains in limbo for the next 7 years. In the meantime, if I may humbly make a suggestion: whether the final decision is to give SyntaxError or change the semantics, one or a few intermediate versions with a SyntaxWarning might be a good idea.

    ncoghlan commented 6 years ago

    "Just fix the issue" is easier said than done for the same reason that comprehensions were implemented the way they are now: lambda expressions still have to work.

    That is, we need to maintain the invariant that:

    [x for x in iterable]
    {x for x in iterable}
    (k:v for k, v in iterable)
    (x for x in iterable)

    give the same results (respectively) as:

    [(lambda: x)() for x in iterable]
    {(lambda: x)() for x in iterable}
    ((lambda: k)():(lambda: v)() for k, v in iterable)
    ((lambda: x)() for x in iterable)

    Once you work through the implications of "We need the loop variable to visible to lexically nested scopes, but invisible in the containing scope", you're going to end up with something that looks enough like a nested function that the easiest to implement and explain option is to have it *be* a nested function.

    I'd be fine with a resolution that forbade yield expressions directly inside implicit scopes, though.

    ncoghlan commented 6 years ago

    Also see https://bugs.python.org/issue1660500 for the original Python 3.0 change to hide the iteration variable.

    While the test suite already covers some interesting scoping edge cases as result of that initial patch, I think one we're currently missing is the nested comprehension case:

        >>> [[x for x in range(1, i)] for i in range(2, 5)]
        [[1], [1, 2], [1, 2, 3]]
    serhiy-storchaka commented 6 years ago

    I see nothing special in [[x for x in range(1, i)] for i in range(2, 5)]. This should be equivalent to:

    __result = []; __i = None
    try:
        for __i in range(2, 5):
            __result2 = []; __x = None
            try:
                for __x in range(1, __i)
                    __result2.append(__x)
                __result.append(__result2)
            finally:
                del __result2, __x
    finally:
        del __i
    ncoghlan commented 6 years ago

    Another slight variant to that test case to make sure the inner comprehension actually generates a closure reference in the current implementation:

        >>> [[x+y for x in range(2) for y in range(1, i)] for i in range(2, 5)]
        [[1, 2], [1, 2, 2, 3], [1, 2, 3, 2, 3, 4]]

    These are the kind of challenges that drove of us towards the current implementation. While the status quo definitely has its downsides, those downsides at least all flow directly from "There's an implicit nested function there that you're not expecting".

    Yury took all this into account when designing the interaction between await and comprehensions (which is why that's in a better state), but for yield we inherited the existing behaviour of any other nested function.

    serhiy-storchaka commented 6 years ago

    This is straightforward.

    __result = []; __i = None
    try:
        for __i in range(2, 5):
            __result2 = []; __x = __y = None
            try:
                for __x in range(2):
                    for __y in range(1, __i):
                        __result2.append(__x + __y)
                __result.append(__result2)
            finally:
                del __result2, __x, __y
    finally:
        del __i
    ncoghlan commented 6 years ago

    Sure, you can technically do that, but you'll have to rewrite most of the symtable pass in the compiler, since you won't be able to re-use the current recursive descent algorithms (which are designed to handle function scopes). That ran afoul of "If the implementation is hard to explain, it's a bad idea."

    Where that gets particularly nasty is when you nest comprehensions with the *same* iteration variable name:

        >>> [[i for i in range(2)] for i in range(2)]
        [[0, 1], [0, 1]]

    Now your name generator needs to be clever enough to account for how many levels deep it is. You also run into the lexical scoping problem anyway once a lambda expression shows up:

        >>> [[(lambda i=i: i)() for i in range(2)] for i in range(2)]
        [[0, 1], [0, 1]]
        >>> [[(lambda: i)() for i in range(2)] for i in range(2)]
        [[0, 1], [0, 1]]

    As noted earlier, not impossible to resolve, but for from the "simple" option that folks seem to believe it is (and which I thought it would be before I actually tried to implement it while respecting Python's lexical scoping rules).

    ilevkivskyi commented 6 years ago

    Nick,

    Yury took all this into account when designing the interaction between await and comprehensions (which is why that's in a better state), but for yield we inherited the existing behaviour of any other nested function.

    Actually the fact that await in comprehensions already works right is one of my main arguments that we can also fix yield. So I am totally with Serhiy here.

    serhiy-storchaka commented 6 years ago

    Consider this as a sort of optimization. A function inlining. In general case we can't inline a function because it can be rebinded at any time, but in this particular case we call a just created function that doesn't have any references and live only on the stack a, and we have all information for generating an inlined code. This will allow to get rid of the overhead for creating and calling a one-time function.

    The naming problem is the hardest problem. We should assign unique names for internal variables, like '.0.x'. We already do this for the single parameter of a one-time function. Or we could just use a stack (this requires more work but has additional benefits).

    If you want to think about comprehensions as about calling inner functions, you can do this. And you can think about comprehensions as about loops which don't leak inner variables. The behavior should be absolutely identical until you will use "yield" which would turn your list producing function into a generator. If make "yield" turning into a generator not an implicit inner function, but an explicit enclosing function, a comprehension could be represented via an implicit inner function. Currently there is no way to express this in Python syntax, but before introducing "yield" there was no way to express its semantic.

    serhiy-storchaka commented 6 years ago

    I concur with Ivan. It would be nice to me to disallow "yield" in comprehensions and generator expressions, but the fact that "await" in comprehensions already works right, and this behavior is intentional and useful, make me thinking that "yield" should work in comprehensions.

    gvanrossum commented 6 years ago

    Remind me what happens when you use await in a generator expression that survives the async function's scope?

    async def f(xs):
        return (await x for x in xs)
    ilevkivskyi commented 6 years ago

    Remind me what happens when you use await in a generator expression that survives the async function's scope?

    Awaiting on f([1, 2]) will result in an async generator (even though yield never appears here). Yury explained why this happens in https://bugs.python.org/issue32113

    ilevkivskyi commented 6 years ago

    (Of course there should be not [1, 2] in the argument, but a list of some awaitables, otherwise there will be an error later.)

    1st1 commented 6 years ago

    Yury explained why this happens in https://bugs.python.org/issue32113

    It happens because '(x for x in xs)' creates a synchronous generator. So when there's an 'await' in it, it creates an asynchronous generator.

    1st1 commented 6 years ago

    It's also important to note, that in 3.7, this is legal:

      def foo():
        return (await x for x in xs)

    Note that 'foo' is a regular function, not 'async def'.

    ilevkivskyi commented 6 years ago

    ... but [await x for x in xs] is still valid _only_ inside async def.

    1st1 commented 6 years ago

    ... but [await x for x in xs] is still valid _only_ inside async def.

    Yes, because it's computed right where it is defined.

    a = [x for x in xs]  # `a` is a list object
    a = (x for x in xs)  # `a` is a generator

    Do you understand the difference?

    ilevkivskyi commented 6 years ago

    Well, after all I am thinking maybe it is indeed makes sense to ban yield inside both sync/async and both comprehensions/generator expressions. Since we already have a smörgåsbord of intuitive behaviors.

    This, _together_ with extensive clarification to the docs and error messages can fix the problem.

    gvanrossum commented 6 years ago

    I honestly think we went too far here. Asynchronous generators are still provisional. Though the PEP is silent about this, the acceptance notice is clear: https://mail.python.org/pipermail/python-dev/2016-September/146267.html. I propose to rip them out.

    1st1 commented 6 years ago

    Well, after all I am thinking maybe it is indeed makes sense to ban yield inside both sync/async and both comprehensions/generator expressions.

    I agree, as I can't imagine a real use case for

       a = ((yield a) for a in as)

    which is really equivalent to

       def a():
          for a in as:
             yield (yield a)
    ilevkivskyi commented 6 years ago

    Do you understand the difference?

    Yury, sorry, but what is your problem? Have I said something about that this is bad. Your tone is clearly insulting, and this is not the first time. Maybe it makes sense to have some respect?

    1st1 commented 6 years ago

    > Do you understand the difference?

    Yury, sorry, but what is your problem? Have I said something about that this is bad. Your tone is clearly insulting, and this is not the first time. Maybe it makes sense to have some respect?

    Sorry, I didn't mean to insult anybody. I asked an honest question with an intent to clarify if there's some misunderstanding of the topic that I'm partially responsible for in CPython.

    ilevkivskyi commented 6 years ago

    Yury OK, sorry then this is a misunderstanding from my side.

    1st1 commented 6 years ago

    Yury OK, sorry then this is a misunderstanding from my side.

    NP. Again, sorry if I sounded that way to you.

    ilevkivskyi commented 6 years ago

    Guido, I am not sure about the complete removal, this is probably to radical. What I think we are missing more detailed docs that would be clear about the corner cases. (I already mentioned this in https://bugs.python.org/issue32113)

    gvanrossum commented 6 years ago

    I think we all need to calm down a bit. How about not posting about this topic for 24 hours.

    ilevkivskyi commented 6 years ago

    How about not posting about this topic for 24 hours.

    OK, makes sense :-)

    ncoghlan commented 6 years ago

    Given the direction of the python-dev thread, should we split this question into two issues?

    Issue 1: a yield expression inside a comprehension changes the type of the expression result (returning a generator-iterator instead of the expected container type)

    Issue 2: a yield expression inside a generator expression interacts weirdly with the genexp's implicit yield expression

    I ask, as it seems to me that issue 1 can be addressed by wrapping the affected cases in an implicit 'yield from' expression, which will both fix the return type of the expression and turn the outer function into a generator (if it isn't one already). (I'm going to put together a proof-of-concept for that idea this weekend)

    By contrast, the interaction between generator expressions and explicit yield expressions seems intrinsically confusing, so I'm not sure we can do any better than declaring it a syntax error to try to combine them.

    ncoghlan commented 6 years ago

    I realised that even without modifying the compiler first, I could illustrate the proposed yield from based resolution for the comprehension case by way of explicit yield from clauses:

    def get_gen_result(gen, inputs):
        try:
            yield_value = next(gen)
            for send_value in inputs:
                print(f"Received: {yield_value}; Sending: {send_value}")
                yield_value = gen.send(send_value)
        except StopIteration as exc:
            return exc.value
        raise RuntimeError("Failed to exhaust generator")
    
    def example():
        comp1 = yield from [str((yield x)) for x in ('1st', '2nd')]
        comp2 = yield from [int((yield x)) for x in ('3rd', '4th')]
        return comp1, comp2
    
    >>> result = get_gen_result(example(), range(4))
    Received: 1st; Sending: 0
    Received: 2nd; Sending: 1
    Received: 3rd; Sending: 2
    Received: 4th; Sending: 3
    >>> result
    (['0', '1'], [2, 3])

    So if we decided to make yield-in-a-comprehension imply the use of yield from, we'd only need:

    gvanrossum commented 6 years ago

    No to both. See python-dev.