python / cpython

The Python programming language
https://www.python.org
Other
63.32k stars 30.31k forks source link

itertools.starmap can fail to construct a proper iterator #117833

Closed takuom closed 6 months ago

takuom commented 6 months ago

Bug report

Bug description:

Contrary to what's claimed in the documentation for itertools.starmap, the starmap object m below is not properly an iterator in the sense specified in the documentation. Specifically, m.__next__ keeps returning a value after it raises StopIteration.

from itertools import repeat, starmap

class F:
    def __call__(self):
        if self._a:
            self._a = False
            raise StopIteration

    def __init__(self, /, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._a = True

# Callable object of type `F`
f = F()

# Problematic object
m = starmap(f, repeat([]))

# Exhaust `m`.
[*m]

print(next(m)) # "None"

Compare m with i below.

[*(i := iter(F(), object()))]
next(i) # StopIteration

I understand that starmap(f, repeat([])) should really be an iterator equivalent to iter(f, object()) for every f which makes the latter a valid iterator.

A conceptually better expression by the way for an iterator equivalent to these would be map(f). The documentation actually says that map requires a second argument, so this requirement is technically not a bug of map, but I don't see any advantage of it, and think that it's against natural expectation when map can accept any other number than zero of similar arguments. Note that starmap(f, repeat(seq)) is equivalent to map(f, *map(repeat, seq)) when the sequence seq is non-empty. map(f, *map(repeat, [])) can of course be written simply as map(f). While the expression starmap(f, repeat([])) tells to pass no argument to f each time, the expression map(f) simply doesn't tell to pass any argument.

CPython versions tested on:

3.11

Operating systems tested on:

macOS

pochmann3 commented 6 months ago

Same with just map and a simple function:

def f(stop):
    if stop:
        raise StopIteration
    return 'surprise!'

m = map(f, [True, False])
print([*m])     # []
print(next(m))  # 'surprise!'

Attempt This Online!

I think map not catching a StopIteration has been discussed a few times and I recently saw some code deliberately relying on it and have done that myself as well once or twice.

pochmann3 commented 6 months ago

Previous issues about map:

takuom commented 6 months ago

Thank you @pochmann3 for your comments.

OK, map has the same problem!

Do you mean that some people have relied on some map objects being false iterators?

pochmann3 commented 6 months ago

I don't mean relying on this "continue-after-stopping" but relying on map not catching a StopIteration raised by the function and the StopIteration instead being used by an outside consumer to stop iteration. Even the Python docs once used that:

pochmann3 commented 6 months ago

false iterators

Not false but maybe broken (depending on whether letting a raised exception fall through counts as "raising"):

Once an iterator’s __next__() method raises StopIteration, it must continue to do so on subsequent calls. Implementations that do not obey this property are deemed broken.

takuom commented 6 months ago

I understood. Thank you for the clarification. (Right. I meant "broken". I'm glad that you understood it.)

I just want every properly constructed map (or starmap) object to be an unbroken iterator, but if map is ever going to "catch StopIteration" for that purpose, I wonder if [*map(f, iterable)] could be equivalent to the non-iterator (i.e., sequence) version of map(f, iterable) in old Python. I guess it's supposed to be. I don't know much about the old map, but I know that, at the moment, there is a difference of [*map(f, iterable)] from [f(arg) for arg in iterable].

It's not a use of non-catching of StopIteration, but currently, any iterator i is equivalent to map(next, itertools.repeat(i)) as well as (slightly more efficiently) map(i.__next__) with the extension suggested in my first post. This might be showing some intuition for that behaviour of map.

takuom commented 6 months ago

I know that, at the moment, there is a difference of [*map(f, iterable)] from [f(arg) for arg in iterable].

Actually, I noticed that (f(arg) for arg in iterable) was like map(f, iterable) before PEP 479. Now (next(iterator) for _ in range(l)) is not (necessarily) equivalent to itertools.islice(iterator, l)...

terryjreedy commented 6 months ago

@rhettinger

takuom commented 6 months ago

I was being confused here:

Now (next(iterator) for _ in range(l)) is not (necessarily) equivalent to itertools.islice(iterator, l)...

These are not even close from the beginning. "iterator" in the generator expression is just a name, and the resulting generator won't remember any particular value of it.

Similarly, there was only loose analogy between a generator expression and a map object before PEP 479.

pochmann3 commented 6 months ago

A conceptually better expression by the way for an iterator equivalent to these would be map(f).

I've btw posted that in ideas now.

pochmann3 commented 6 months ago

I think map not catching a StopIteration has been discussed a few times and I recently saw some code deliberately relying on it

I think it was the roundrobin itertools recipe where rhettinger did that recently:

def roundrobin(*iterables):
    "Visit input iterables in a cycle until each is exhausted."
    # roundrobin('ABC', 'D', 'EF') → A D E B F C
    # Algorithm credited to George Sakkis
    iterators = map(iter, iterables)
    for num_active in range(len(iterables), 0, -1):
        iterators = cycle(islice(iterators, num_active))
        yield from map(next, iterators)
takuom commented 6 months ago

Why by the way is it a good thing that an iterator which stops once never iterates again? I thought it may be fine if the documentation just says that a map/starmap object is a possibly broken iterator. The behaviour can be forced easily when necessary, e.g.,

def repair_iter(i):
    return (yield from i)
takuom commented 6 months ago

Why by the way is it a good thing that an iterator which stops once never iterates again?

One thing I note is that a broken iterator which resumes iterating after stopping may lead to a complex behaviour (which may or may not be convenient) when it is one of a few iterables passed as arguments to map, assuming that the resulting map object will resume iterating after stopping.

I'm also not sure how to evaluate the following behaviour of the current map. I think it would be better being documented at least.

i = iter("Consumed Expected? Look!".split())
j = ()

# A `map` object which *is* a proper iterator
m = map(print, i, j)

# Exhaust `m`.
[*m] # Nothing printed

# Exhaust it again.
[*m]

print(next(i)) # "Look!", not "Expected?"
takuom commented 6 months ago

Actually, the behaviour is that of zip. (I don't think of it as a problem at this point. It might be healthy.)

i = iter("Consumed Expected? Look!".split())
j = ()

# A `zip` object which might seem odd
m = zip(i, j)

# Exhaust `m`.
[*m]

# Exhaust it again.
[*m]

print(next(i)) # "Look!", not "Expected?"

If this should be reported as a separate (perhaps documentation?) issue, then I would be glad to create a new issue here.

takuom commented 6 months ago

[I have deleted a post here which reported that zip can create a broken iterator when it's passed a broken iterator.]

pochmann3 commented 6 months ago

I'm also not sure how to evaluate the following behaviour of the current map.

Interesting. I'd expect an exhausted map iterator to do nothing when iterated again (other than to raise a StopIteration).

takuom commented 6 months ago

I can understand the expectation. But, again, the behaviour can be muted e.g., with repair_iter above when it can cause a real problem. So the issue might really be about the documentation. I'm not sure...

def repair_iter(i):
    return (yield from i)
rhettinger commented 6 months ago

FWIW, I don't agree that there is a problem here. It seems like reading more into the words than they actually meant. The starmap() code is obligated to stop and stay stopped if the input iterator stops, but it is under no obligation to check for StopIteration from the function call.

To me, the OP's code seems like a cute parlor trick. In the real world, no actual problem has ever been observed (map and starmap have functioned well for nearly two decades). Changing the logic is more likely to break someone's existing code than to fix an actual problem.

I'll leave this open for a while for more commentary but expect to close it unless something more compelling is posted.

Also the docs look fine to me. The relevant sentences start with, "When no more data are available a StopIteration exception is raised instead." In the OPs example, more data IS available."

pochmann3 commented 6 months ago

The starmap() code is obligated to stop and stay stopped if the input iterator stops

Do you mean if the input iterator stops and stays stopped? Because if the input iterator continues after stopping, then so does starmap. It doesn't stay stopped then. (I assume with "stopping" we mean StopIteration coming from a __next__ call.)

Example:

from itertools import starmap

def f(x):
    if x:
        return x,
    raise StopIteration

continues_after_stopping = map(f, [1, 2, 0, 3, 4, 5, 6, 0, 0, 7, 8, 9])

s = starmap(int, continues_after_stopping)

for _ in range(6):
    print(list(s))

Output (Attempt This Online!):

[1, 2]
[3, 4, 5, 6]
[]
[7, 8, 9]
[]
[]

Here starmap stopped after yielding 1 and 2, but then continued and stopped multiple times, even after once not yielding anything.

rhettinger commented 6 months ago

Because if the input iterator continues after stopping, then so does starmap

That is fine. Map and starmap are not obliged to correct the behavior of a malformed input.

To say otherwise would be to disallow commonly used pass-through code:

    class ListWrapper:

        def __init__(self, seq):
            self.seq = seq

        def __iter__(self):
            return iter(self.seq)

ISTM that this thread is working really hard to invent a problem that doesn't exist in practice. As a starting point, note that map/starmap have functioned well for almost two decades. ISTM that there isn't an actual problem here beyond an overly strict reading of a glossary entry.

pochmann3 commented 6 months ago

That is fine. Map and starmap are not obliged to correct the behavior of a malformed input.

Good to hear that from a core dev. That's been my stance as well (in general, not just for map/starmap).

takuom commented 6 months ago

Thank you @rhettinger for your comments.

Also the docs look fine to me. The relevant sentences start with, "When no more data are available a StopIteration exception is raised instead." In the OPs example, more data IS available.

So an expression like [*m] does not necessarily exhaust an iterator m since more data may be available after that. While that seems reasonable, this part of the documentation may then be saying too much:

Once an iterator’s __next__() method raises StopIteration, it must continue to do so on subsequent calls. Implementations that do not obey this property are deemed broken.

(It says this so clearly, and whether it actually means that or not is not so clear to me...)

Or perhaps it should say that "iterator" elsewhere in the documentation normally means a possibly "broken" one, since I don't think that the reader is likely to expect that otherwise.

tim-one commented 6 months ago

I think a difference may be that "raises StopIteration" is meant to be "actively raises" in some sense that may be tricky to make precise; and not meant to cover cases where an iterator merely passively passes on a StopIteration raised by something the iterator invokes.

But I may be wrong.

takuom commented 6 months ago

That sounds interesting. It would actually [edit: some ] useful if that can be made precise and that gets into the docs. For a map object's __next__, I think you are suggesting that StopIteration resulting from a function call is "passive", and that resulting from an input iterator's __next__ is "active". I don't have a good idea on this, but hope that some people do.

takuom commented 6 months ago

Oh, I should have said "that resulting from StopIteration actively raised by an input iterator's __next__"...

For a map object's __next__, I think you are suggesting that StopIteration resulting from a function call is "passive", and that resulting from an input iterator's __next__ is "active".

A passively raised StopIteration from an input iterator's __next__ should be considered passive from the map object's __next__ as well.

takuom commented 6 months ago

Actually, there won't be any difficulty if we have two classes of StopIteration exceptions: ActiveStopIteration and PassiveStopIteration...

rhettinger commented 6 months ago

Actually, there won't be any difficulty if we have two classes of StopIteration exceptions: ActiveStopIteration and PassiveStopIteration...

Sorry, I think we've exhausted the useful discussion here and I'm marking this as closed. There doesn't appear to be any practical reason to change what we have now.

terryjreedy commented 6 months ago

The various entries in the docs are meant to be read and interpreted together as appropriate. StopIteration is a special exception and its entry must be included when discussing its use:

Raised by built-in function next() and an iterator's __next__() method to signal that there are no further items produced by the iterator. ...

This statement is intended to be exhaustive. Any other use is non-canonical. The entry says later that since 3.7, StopIteration in generators and coroutines is converted to a RuntimeError. The results of other non-canonical uses are left undefined; but such are 'off-label' and puts one in consenting-adults territory. In particular, functions are not supposed to raise StopIteration. If they do, one gets what one gets. As Raymond said, starmap may legitimately assume that the function argument does not raise StopIteration. And this is true for any generic function/callable argument.

takuom commented 6 months ago

Thank you @terryjreedy for giving us more hints for understanding the issue better.

Please note map(next, iterators) in the roundrobin example above. In practice, I think it's most likely next which raises StopIteration for a map object's __next__. So I think the issue is on canonical uses of StopIteration resulting in non-canonical uses.

An example of a similar result with next:

iterators = map(iter, [(), (99, )])
m = map(next, iterators)
[*m]
print(next(m)) # 99

Thus, StopIteration used by next results in m.__next__ using it in a non-canonical way. Can we really differentiate between canonical and non-canonical uses?

map/starmap may violate [edit: when the programmer doesn't] what the intendedly exhaustive statement says, which seems extraordinary. If the docs refrain from mentioning it, then I'm afraid that they can be confusing to some users.

takuom commented 6 months ago

On the other hand, I think that mentioning the fact just lightly would actually be more than a good enough job of docs. I expect that to end up helping many users in the long run. Just a small bit of additional thoughts.

takuom commented 4 months ago

I'm also not sure how to evaluate the following behaviour of the current map.

Interesting. I'd expect an exhausted map iterator to do nothing when iterated again (other than to raise a StopIteration).

Actually, the glossary says that an exhausted iterator must "just" raise StopIteration.

When no more data are available a StopIteration exception is raised instead. At this point, the iterator object is exhausted and any further calls to its next() method just raise StopIteration again.

Therefore, there's another thing not covered by the docs' intendedly exhaustive statement. I think that the docs would be more helpful just by getting open about this as well.