Closed dd6e2082-79e8-4cb2-8eba-19d2a01462e8 closed 7 years ago
The documentation given for itertools.zip_longest contains a "roughly equivalent" pure-python implementation of the function that is intended to help the user understand what zip_longest does on a functional level.
However, the given implementation is very complicated to read for newcomers and experienced Python programmers alike, as it uses a custom-defined exception for control flow handling, a nested function, a condition that always is true if any arguments are passed ("while iterators"), as well as two other non-trivial functions from itertools (chain and repeat).
For future reference, this is the currently given implementation:
def zip_longest(*args, **kwds):
# zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
fillvalue = kwds.get('fillvalue')
iterators = [iter(it) for it in args]
while True:
exhausted = 0
values = []
for it in iterators:
try:
values.append(next(it))
except StopIteration:
values.append(fillvalue)
exhausted += 1
if exhausted < len(args):
yield tuple(values)
else:
break
This is way more complex than necessary to teach the concept of zip_longest. With this issue, I will submit a pull request with a new example implementation that seems to be the same level of "roughly equivalent" but is much easier to read, since it only uses two loops and now complicated flow
def zip_longest(*args, **kwds):
# zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
fillvalue = kwds.get('fillvalue')
iterators = [iter(it) for it in args]
while True:
exhausted = 0
values = []
for it in iterators:
try:
values.append(next(it))
except StopIteration:
values.append(fillvalue)
exhausted += 1
if exhausted < len(args):
yield tuple(values)
else:
break
Looking at the C code of the actual implementation, I don't see that any one of the two implementations is obviously "more equivalent". I'm unsure about performance -- I haven't tried them on that but I don't think that's the point of this learning implementation.
I ran all tests from Lib/test/test_itertools.py against both the old and the new implementation. The new implementation fails at 3 tests, while the old implementation failed at four. Two of the remaining failures are related to TypeErrors not being thrown on invalid input, one of them is related to pickling the resulting object. I believe all three of them are fine to ignore in this sample, as it is not relevant to the documentation purpose.
Therefore, I believe the documentation should be changed like suggested. I'd be happy for any feedback or further ideas to improve its readability!
I just noticed that in my post I accidentally pasted MY implementation twice instead of the old one, sorry for that. Here's the old one for quick comparison:
class ZipExhausted(Exception):
pass
def zip_longest(*args, **kwds):
# zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
fillvalue = kwds.get('fillvalue')
counter = len(args) - 1
def sentinel():
nonlocal counter
if not counter:
raise ZipExhausted
counter -= 1
yield fillvalue
fillers = repeat(fillvalue)
iterators = [chain(it, sentinel(), fillers) for it in args]
try:
while iterators:
yield tuple(map(next, iterators))
except ZipExhausted:
pass
Thanks for wanting to improve the documentation.
Raymond will address this definitively, but unless I'm mistaken part of the purpose of the examples is to show how the various itertools can be used. If that is true, then in the context of the overall itertools documentation I think the current example has more teaching value than your suggested revision.
Well, I could think of a way to still use repeat() here that also is pretty clean except for the fact that it fails if all inputs to zip_longest are repeat() iterators themselves (which would here lead to an empty iterator while it would otherwise lead to an infinite one):
def zip_longest(*args, **kwds):
# zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
fillvalue = kwds.get('fillvalue')
iterators = [iter(it) for it in args]
while True:
values = []
for i, it in enumerate(iterators):
try:
values.append(next(it))
except StopIteration:
values.append(fillvalue)
iterators[i] = repeat(fillvalue)
if all(isinstance(it, repeat) for it in iterators):
break
else:
yield tuple(values)
Keeping chain() in use here just for the sake of using it is not worth it, I believe.
New changeset 3147b0422cbeb98065666ccf95ab6845ac800fd4 by Raymond Hettinger in branch 'master': bpo-31270: Modification of Pr 3200 (bpo-3427) https://github.com/python/cpython/commit/3147b0422cbeb98065666ccf95ab6845ac800fd4
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/rhettinger' closed_at =
created_at =
labels = ['type-feature', '3.7', 'docs']
title = 'Simplify documentation of itertools.zip_longest'
updated_at =
user = 'https://github.com/raphaelm'
```
bugs.python.org fields:
```python
activity =
actor = 'rhettinger'
assignee = 'rhettinger'
closed = True
closed_date =
closer = 'rhettinger'
components = ['Documentation']
creation =
creator = 'rami'
dependencies = []
files = []
hgrepos = []
issue_num = 31270
keywords = []
message_count = 5.0
messages = ['300788', '300790', '300799', '300830', '301635']
nosy_count = 4.0
nosy_names = ['rhettinger', 'r.david.murray', 'docs@python', 'rami']
pr_nums = ['3200', '3427']
priority = 'normal'
resolution = 'fixed'
stage = 'resolved'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue31270'
versions = ['Python 3.6', 'Python 3.7']
```