python / cpython

The Python programming language
https://www.python.org
Other
62.56k stars 30.02k forks source link

Using `return value` in a generator function skips the returned value on for-loop iteration #79937

Closed 3b1aabf8-9f12-4083-9274-c0c3522730bb closed 5 years ago

3b1aabf8-9f12-4083-9274-c0c3522730bb commented 5 years ago
BPO 35756
Nosy @terryjreedy, @stevendaprano
Files
  • ex1.py: Example reproduction - 1
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['invalid', 'type-bug', '3.7'] title = 'Using `return value` in a generator function skips the returned value on for-loop iteration' updated_at = user = 'https://bugs.python.org/BryanKoch' ``` bugs.python.org fields: ```python activity = actor = 'steven.daprano' assignee = 'none' closed = True closed_date = closer = 'terry.reedy' components = [] creation = creator = 'Bryan Koch' dependencies = [] files = ['48062'] hgrepos = [] issue_num = 35756 keywords = [] message_count = 13.0 messages = ['333802', '333809', '333817', '333824', '333838', '333918', '334017', '334020', '334034', '334181', '334190', '334191', '334192'] nosy_count = 4.0 nosy_names = ['terry.reedy', 'steven.daprano', 'Bryan Koch', 'greg.ewing'] pr_nums = [] priority = 'normal' resolution = 'not a bug' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue35756' versions = ['Python 3.7'] ```

    3b1aabf8-9f12-4083-9274-c0c3522730bb commented 5 years ago

    Using the new "return value is semantically equivalent to raise StopIteration(value)" syntax created in PEP-380 (https://legacy.python.org/dev/peps/pep-0380/#formal-semantics) causes the returned value to be skipped by standard methods of iteration.

    The PEP reads as if returning a value via StopIteration was meant to signal that the generator was finished and that StopIteration.value was the final value. If StopIteration.value is meant to represent the final value, then the built-in for-loop should not skip it and the current implementation in 3.3, 3.4, 3.5, and 3.6 should be considered an oversight of the PEP and a bug (I don't have a version of 3.7 or 3.8 to test newer versions).

    Reproduction code is attached with comments/annotations.

    stevendaprano commented 5 years ago

    You say:

    The PEP reads as if returning a value via StopIteration was meant to signal that the generator was finished and that StopIteration.value was the final value.

    To me, the PEP is clear that return expr is equivalent to raise StopIteration(expr) which is not used as an iteration value. Can you point to the passage in the PEP that suggests something different to you?

    3b1aabf8-9f12-4083-9274-c0c3522730bb commented 5 years ago

    I understood the PEP to include return expr in the iteration values as per the first bullet of the proposal.

    Any values that the iterator yields are passed directly to the caller.

    This bullet doesn't have any additional formatting on the word "yields" so I consider it as not directly referring to the "yield" keyword.

    With the current implementation, I have to concern myself if a generator function was created with the intention of being called using last_ret = yield from function(); yield last_ret or as for ret in function(): yield ret. The first also yields the return value but would also yield an additional None if a return was not the terminal cause; the second will miss the last value if the generator uses return.

    Essentially, allowing return expr in generator functions without invoking the generator using yield from generator will lose the last value.

    I support either of the below resolutions:

    stevendaprano commented 5 years ago

    I understood the PEP to include return expr in the iteration values as per the first bullet of the proposal.

    Any values that the iterator yields are passed directly to the caller.

    This bullet doesn't have any additional formatting on the word "yields" so I consider it as not directly referring to the "yield" keyword.

    I read it as "yields", not "yields or returns". Lack of formatting is irrelevant -- we shouldn't expect every use of a word with a technical meaning to necessarily be formatted specifically.

    Read the Proposal section:

    The following new expression syntax will be allowed in the body 
    of a generator 
    [...]
    FURTHERMORE, when the iterator is another generator, the 
    subgenerator is allowed to execute a return statement with a 
    value, AND THAT VALUE BECOMES THE VALUE OF THE YIELD FROM 
    EXPRESSION. [emphasis added]

    https://legacy.python.org/dev/peps/pep-0380/#id11

    It does not say "that value is yielded and then becomes the value of the yiueld from expression".

    To me, it is clear that the intention here is that the return value is not yielded.

    The Abstract also says:

    Additionally, the subgenerator is allowed to return with a
    value, and the value IS MADE AVAILABLE to the delegating
    generator. [emphasis added]

    The use of "is made available" suggests that the return value is treated differently from a yield. Values yielded from the subgenerator are automatically yielded from the calling generator, without any additional effort. The value returned is merely *made available*, for the calling generator to do with whatever it wishes.

    And if there is still any doubt, there is specification of the behaviour of "result = yield from expression" which makes it clear that the return value carried by the StopIteration exception is not yielded, but used as the value of the expression (i.e. assigned to result).

    The motivation of yield from returning a value is to allow a side- channel independent of the iterable values. It isn't intended as a "one last yield and then bail out". I don't think that your interpretation can be justified by reading the PEP.

    Essentially, allowing return expr in generator functions without invoking the generator using yield from generator will lose the last value.

    No, because the return value is not intended to be used as one of the iteration values. Quoting one of Greg Ewing's examples:

    I hope it is also clear why returning values via yield,
    or having 'yield from' return any of the yielded values,
    would be the wrong thing to do. The send/yield channel and
    the return channel are being used for completely different
    purposes, and conflating them would be disastrous.

    http://www.cosc.canterbury.ac.nz/greg.ewing/python/yield-from/yf_current/Examples/Parser/parser.txt

    That example is indirectly linked to from the PEP.

    8e2f6344-9dcd-49a2-8932-7e12e7b33d8d commented 5 years ago

    There is no bug here; the current implementation is working as intended.

    The word "yields" in the quoted section of the PEP indeed refers to the "yield" keyword and nothing else. Possibly that could be clarified, but I believe it's already clear enough when read in the context of the rest of the PEP.

    3b1aabf8-9f12-4083-9274-c0c3522730bb commented 5 years ago

    Thank you both for the clarifications. I agree these's no bug in yield from however is there a way to reference the return value when a generator with a return is invoked using for val in gen i.e. when the generator is invoked without delegation?

    I could write my own wrapper around using next to work around this but it might be an oversight of the new grammar (new being relative) that the return value is only available when invoked from the yield from syntax.

    Essentially I have code that looks like for value in generator: do thing with value yield value where I need to do something before yielding the value. It would be awesome if invoking a generator above would throw a SyntaxError iff it contained a return and it wasn't invoked through yield from.

    The below isn't valid Python and I'm not sure that it should be but it's what I need to do.

    ` return_value = for value in generator: do thing with value yield value

    if return_value:
      do something with return_value
    `
    terryjreedy commented 5 years ago

    No bug here. You can discuss possible change on python-ideas, but I strongly suggest that you write your next wrapper and move on.

    8e2f6344-9dcd-49a2-8932-7e12e7b33d8d commented 5 years ago

    bryan.koch wrote:

    It would be awesome if invoking a generator above would throw a SyntaxError iff it contained a return and it wasn't invoked through yield from.

    Why do you care about that so much? There's nothing to stop you from ignoring the return value of an ordinary function. Why should generator functions be different?

    stevendaprano commented 5 years ago

    On Fri, Jan 18, 2019 at 12:31:51AM +0000, bryan.koch wrote:

    Thank you both for the clarifications. I agree these's no bug in yield from however is there a way to reference the return value when a generator with a return is invoked using for val in gen i.e. when the generator is invoked without delegation?

    I don't believe so, because the for loop machinery catches and consumes the StopIteration.

    Any change to that behaviour would be new functionality that would probably need to go through Python-Ideas first.

    [...]

    Essentially I have code that looks like for value in generator: do thing with value yield value where I need to do something before yielding the value. It would be awesome if invoking a generator above would throw a SyntaxError iff it contained a return and it wasn't invoked through yield from.

    How is the interpreter supposed to know? (I assume you mean for the SytnaxError to be generated at compile-time.) Without doing a whole-program analysis, there is no way for the interpreter to compile a generator:

        def gen():
            yield 1
            return 1

    and know that no other piece of code in some other module will never call it via a for loop.

    The below isn't valid Python and I'm not sure that it should be but it's what I need to do.

    ` return_value = for value in generator: do thing with value yield value

    if return_value: do something with return_value `

    Let me be concrete here. You have a generator which produces a sequence of values [spam, eggs, cheese, aardvark] and you need to treat the final value, aardvark, differently from the rest:

    do thing with spam, eggs, cheese
    do a different thing with aardvark (if aardvark is a True value)

    Am I correct so far?

    Consequently you writing this as:

    def gen():
        yield spam
        yield eggs
        yield cheese
        return aardvark

    Correct?

    That's an interesting use-case, but I don't think there is any obvious way to solve that right now. Starting in Python 3.8, I think you should be able to write:

    for x in (final := (yield from gen())):
        do something with x  # spam, eggs, cheese
    if final:
        do something different with final  # aardvark
    3b1aabf8-9f12-4083-9274-c0c3522730bb commented 5 years ago

    steven your generator example is exactly what I wanted to do; looks like I'm upgrading to Python 3.8 for the new assignment syntax.

    I was actually expecting the SyntaxError to be raised at runtime which would be a pretty large behavior change (definitely required to go through python-ideas) but I think my use case is covered by 3.8 and just upgrading is simpler to do.

    Some details of the implementation that stirred this is that I'm streaming output from a hierarchy of generated modules and I get what is essentially (final value, EOF) as the last result so I need to yield the final value but for external reasons I need to perform the clean-up of native resources before yielding.

    Let's consider this as closed since what I need is supported in 3.8. Thank you for your help!

    stevendaprano commented 5 years ago

    steven your generator example is exactly what I wanted to do; looks like I'm upgrading to Python 3.8 for the new assignment syntax. Sorry to have mislead you, but I don't think it will do what I thought. After giving it some more thought, I decided to test it (at least as much of it as possible). There's no local assignment here but you can see that the behaviour is not what I had assumed:

    py> def inner():
    ...     yield from (1, 2)
    ...     return -1
    ...
    py> def outer():
    ...     for x in (yield from inner()):
    ...             yield 100+x
    ...
    py> for x in outer():
    ...     print(x)
    ...
    1
    2
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 2, in outer
    TypeError: 'int' object is not iterable

    In hindsight, this behaviour is logical. But I think it means that there is no way to do what you want using a for-loop.

    I was actually expecting the SyntaxError to be raised at runtime which would be a pretty large behavior change

    Not *entirely* unprecedented though, as you can get runtime SyntaxErrors from calling compile(), eval() or exec(). But I think some other class of exception would be better, since the problem isn't actually a syntax error.

    3b1aabf8-9f12-4083-9274-c0c3522730bb commented 5 years ago

    Thanks for testing that. I'm off to write an ugly next() wrapper then.

    stevendaprano commented 5 years ago

    I'm off to write an ugly next() wrapper then.

    Wouldn't it be simpler to re-design the generators to yield the final result instead of returning it? To process the final item differently from the rest, you just need something like this:

    last = next(it)
    for x in it:
        process(last)
        last = x
    special_handling(last)