Closed takuom closed 6 months ago
Same with just map
and a simple function:
def f(stop):
if stop:
raise StopIteration
return 'surprise!'
m = map(f, [True, False])
print([*m]) # []
print(next(m)) # 'surprise!'
I think map
not catching a StopIteration
has been discussed a few times and I recently saw some code deliberately relying on it and have done that myself as well once or twice.
Previous issues about map
:
Thank you @pochmann3 for your comments.
OK, map
has the same problem!
Do you mean that some people have relied on some map
objects being false iterators?
I don't mean relying on this "continue-after-stopping" but relying on map
not catching a StopIteration
raised by the function and the StopIteration
instead being used by an outside consumer to stop iteration. Even the Python docs once used that:
I understood. Thank you for the clarification. (Right. I meant "broken". I'm glad that you understood it.)
I just want every properly constructed map
(or starmap
) object to be an unbroken iterator, but if map
is ever going to "catch StopIteration
" for that purpose, I wonder if [*map(f, iterable)]
could be equivalent to the non-iterator (i.e., sequence) version of map(f, iterable)
in old Python. I guess it's supposed to be. I don't know much about the old map
, but I know that, at the moment, there is a difference of [*map(f, iterable)]
from [f(arg) for arg in iterable]
.
It's not a use of non-catching of StopIteration
, but currently, any iterator i
is equivalent to map(next, itertools.repeat(i))
as well as (slightly more efficiently) map(i.__next__)
with the extension suggested in my first post. This might be showing some intuition for that behaviour of map
.
I know that, at the moment, there is a difference of
[*map(f, iterable)]
from[f(arg) for arg in iterable]
.
Actually, I noticed that (f(arg) for arg in iterable)
was like map(f, iterable)
before PEP 479. Now (next(iterator) for _ in range(l))
is not (necessarily) equivalent to itertools.islice(iterator, l)
...
@rhettinger
I was being confused here:
Now
(next(iterator) for _ in range(l))
is not (necessarily) equivalent toitertools.islice(iterator, l)
...
These are not even close from the beginning. "iterator
" in the generator expression is just a name, and the resulting generator won't remember any particular value of it.
Similarly, there was only loose analogy between a generator expression and a map
object before PEP 479.
A conceptually better expression by the way for an iterator equivalent to these would be
map(f)
.
I've btw posted that in ideas now.
I think
map
not catching aStopIteration
has been discussed a few times and I recently saw some code deliberately relying on it
I think it was the roundrobin
itertools recipe where rhettinger did that recently:
def roundrobin(*iterables):
"Visit input iterables in a cycle until each is exhausted."
# roundrobin('ABC', 'D', 'EF') → A D E B F C
# Algorithm credited to George Sakkis
iterators = map(iter, iterables)
for num_active in range(len(iterables), 0, -1):
iterators = cycle(islice(iterators, num_active))
yield from map(next, iterators)
Why by the way is it a good thing that an iterator which stops once never iterates again? I thought it may be fine if the documentation just says that a map
/starmap
object is a possibly broken iterator. The behaviour can be forced easily when necessary, e.g.,
def repair_iter(i):
return (yield from i)
Why by the way is it a good thing that an iterator which stops once never iterates again?
One thing I note is that a broken iterator which resumes iterating after stopping may lead to a complex behaviour (which may or may not be convenient) when it is one of a few iterables passed as arguments to map
, assuming that the resulting map
object will resume iterating after stopping.
I'm also not sure how to evaluate the following behaviour of the current map
. I think it would be better being documented at least.
i = iter("Consumed Expected? Look!".split())
j = ()
# A `map` object which *is* a proper iterator
m = map(print, i, j)
# Exhaust `m`.
[*m] # Nothing printed
# Exhaust it again.
[*m]
print(next(i)) # "Look!", not "Expected?"
Actually, the behaviour is that of zip
. (I don't think of it as a problem at this point. It might be healthy.)
i = iter("Consumed Expected? Look!".split())
j = ()
# A `zip` object which might seem odd
m = zip(i, j)
# Exhaust `m`.
[*m]
# Exhaust it again.
[*m]
print(next(i)) # "Look!", not "Expected?"
If this should be reported as a separate (perhaps documentation?) issue, then I would be glad to create a new issue here.
[I have deleted a post here which reported that zip
can create a broken iterator when it's passed a broken iterator.]
I'm also not sure how to evaluate the following behaviour of the current
map
.
Interesting. I'd expect an exhausted map
iterator to do nothing when iterated again (other than to raise a StopIteration
).
I can understand the expectation. But, again, the behaviour can be muted e.g., with repair_iter
above when it can cause a real problem. So the issue might really be about the documentation. I'm not sure...
def repair_iter(i): return (yield from i)
FWIW, I don't agree that there is a problem here. It seems like reading more into the words than they actually meant. The starmap() code is obligated to stop and stay stopped if the input iterator stops, but it is under no obligation to check for StopIteration from the function call.
To me, the OP's code seems like a cute parlor trick. In the real world, no actual problem has ever been observed (map and starmap have functioned well for nearly two decades). Changing the logic is more likely to break someone's existing code than to fix an actual problem.
I'll leave this open for a while for more commentary but expect to close it unless something more compelling is posted.
Also the docs look fine to me. The relevant sentences start with, "When no more data are available a StopIteration exception is raised instead." In the OPs example, more data IS available."
The starmap() code is obligated to stop and stay stopped if the input iterator stops
Do you mean if the input iterator stops and stays stopped? Because if the input iterator continues after stopping, then so does starmap
. It doesn't stay stopped then. (I assume with "stopping" we mean StopIteration
coming from a __next__
call.)
Example:
from itertools import starmap
def f(x):
if x:
return x,
raise StopIteration
continues_after_stopping = map(f, [1, 2, 0, 3, 4, 5, 6, 0, 0, 7, 8, 9])
s = starmap(int, continues_after_stopping)
for _ in range(6):
print(list(s))
Output (Attempt This Online!):
[1, 2]
[3, 4, 5, 6]
[]
[7, 8, 9]
[]
[]
Here starmap
stopped after yielding 1 and 2, but then continued and stopped multiple times, even after once not yielding anything.
Because if the input iterator continues after stopping, then so does starmap
That is fine. Map and starmap are not obliged to correct the behavior of a malformed input.
To say otherwise would be to disallow commonly used pass-through code:
class ListWrapper:
def __init__(self, seq):
self.seq = seq
def __iter__(self):
return iter(self.seq)
ISTM that this thread is working really hard to invent a problem that doesn't exist in practice. As a starting point, note that map/starmap have functioned well for almost two decades. ISTM that there isn't an actual problem here beyond an overly strict reading of a glossary entry.
That is fine. Map and starmap are not obliged to correct the behavior of a malformed input.
Good to hear that from a core dev. That's been my stance as well (in general, not just for map
/starmap
).
Thank you @rhettinger for your comments.
Also the docs look fine to me. The relevant sentences start with, "When no more data are available a StopIteration exception is raised instead." In the OPs example, more data IS available.
So an expression like [*m]
does not necessarily exhaust an iterator m
since more data may be available after that. While that seems reasonable, this part of the documentation may then be saying too much:
Once an iterator’s
__next__()
method raisesStopIteration
, it must continue to do so on subsequent calls. Implementations that do not obey this property are deemed broken.
(It says this so clearly, and whether it actually means that or not is not so clear to me...)
Or perhaps it should say that "iterator" elsewhere in the documentation normally means a possibly "broken" one, since I don't think that the reader is likely to expect that otherwise.
I think a difference may be that "raises StopIteration
" is meant to be "actively raises" in some sense that may be tricky to make precise; and not meant to cover cases where an iterator merely passively passes on a StopIteration
raised by something the iterator invokes.
But I may be wrong.
That sounds interesting. It would actually [edit: some ] useful if that can be made precise and that gets into the docs. For a map
object's __next__
, I think you are suggesting that StopIteration
resulting from a function call is "passive", and that resulting from an input iterator's __next__
is "active". I don't have a good idea on this, but hope that some people do.
Oh, I should have said "that resulting from StopIteration
actively raised by an input iterator's __next__
"...
For a
map
object's__next__
, I think you are suggesting thatStopIteration
resulting from a function call is "passive", and that resulting from an input iterator's__next__
is "active".
A passively raised StopIteration
from an input iterator's __next__
should be considered passive from the map
object's __next__
as well.
Actually, there won't be any difficulty if we have two classes of StopIteration
exceptions: ActiveStopIteration
and PassiveStopIteration
...
Actually, there won't be any difficulty if we have two classes of StopIteration exceptions: ActiveStopIteration and PassiveStopIteration...
Sorry, I think we've exhausted the useful discussion here and I'm marking this as closed. There doesn't appear to be any practical reason to change what we have now.
The various entries in the docs are meant to be read and interpreted together as appropriate. StopIteration is a special exception and its entry must be included when discussing its use:
Raised by built-in function next() and an iterator's
__next__()
method to signal that there are no further items produced by the iterator. ...
This statement is intended to be exhaustive. Any other use is non-canonical. The entry says later that since 3.7, StopIteration in generators and coroutines is converted to a RuntimeError. The results of other non-canonical uses are left undefined; but such are 'off-label' and puts one in consenting-adults territory. In particular, functions are not supposed to raise StopIteration. If they do, one gets what one gets. As Raymond said, starmap may legitimately assume that the function argument does not raise StopIteration. And this is true for any generic function/callable argument.
Thank you @terryjreedy for giving us more hints for understanding the issue better.
Please note map(next, iterators)
in the roundrobin example above. In practice, I think it's most likely next
which raises StopIteration
for a map
object's __next__
. So I think the issue is on canonical uses of StopIteration
resulting in non-canonical uses.
An example of a similar result with next
:
iterators = map(iter, [(), (99, )])
m = map(next, iterators)
[*m]
print(next(m)) # 99
Thus, StopIteration
used by next
results in m.__next__
using it in a non-canonical way. Can we really differentiate between canonical and non-canonical uses?
map
/starmap
may violate [edit: when the programmer doesn't] what the intendedly exhaustive statement says, which seems extraordinary. If the docs refrain from mentioning it, then I'm afraid that they can be confusing to some users.
On the other hand, I think that mentioning the fact just lightly would actually be more than a good enough job of docs. I expect that to end up helping many users in the long run. Just a small bit of additional thoughts.
I'm also not sure how to evaluate the following behaviour of the current
map
.Interesting. I'd expect an exhausted
map
iterator to do nothing when iterated again (other than to raise aStopIteration
).
Actually, the glossary says that an exhausted iterator must "just" raise StopIteration
.
When no more data are available a StopIteration exception is raised instead. At this point, the iterator object is exhausted and any further calls to its next() method just raise StopIteration again.
Therefore, there's another thing not covered by the docs' intendedly exhaustive statement. I think that the docs would be more helpful just by getting open about this as well.
Bug report
Bug description:
Contrary to what's claimed in the documentation for
itertools.starmap
, thestarmap
objectm
below is not properly an iterator in the sense specified in the documentation. Specifically,m.__next__
keeps returning a value after it raisesStopIteration
.Compare
m
withi
below.I understand that
starmap(f, repeat([]))
should really be an iterator equivalent toiter(f, object())
for everyf
which makes the latter a valid iterator.A conceptually better expression by the way for an iterator equivalent to these would be
map(f)
. The documentation actually says thatmap
requires a second argument, so this requirement is technically not a bug ofmap
, but I don't see any advantage of it, and think that it's against natural expectation whenmap
can accept any other number than zero of similar arguments. Note thatstarmap(f, repeat(seq))
is equivalent tomap(f, *map(repeat, seq))
when the sequenceseq
is non-empty.map(f, *map(repeat, []))
can of course be written simply asmap(f)
. While the expressionstarmap(f, repeat([]))
tells to pass no argument tof
each time, the expressionmap(f)
simply doesn't tell to pass any argument.CPython versions tested on:
3.11
Operating systems tested on:
macOS