Open d23c70bd-a0ee-4d72-b476-0a282f132652 opened 14 years ago
The Python tutorial offers some dangerous advice about adding iterator behavior to a class: http://docs.python.org/tutorial/classes.html#iterators “By now you have probably noticed that most container objects can be looped over using a for statement: … Having seen the mechanics behind the iterator protocol, it is easy to add iterator behavior to your classes. Define a __iter() method which returns an object with a next() method. If the class defines next(), then __iter() can just return self:”
This is reasonable advice for writing an iterator class, but terrible advice for writing a container class, because it encourages you to associate a single iterator with the container, which breaks nested iteration and leads to hard-to-find bugs. (One of those bugs recently made its way into the code handout for a problem set in MIT’s introductory CS course, 6.00.)
A container class’s __iter__() should return a generator or an instance of a separate iterator class, not self. The tutorial should make this clearer.
I think there is not much ploblem. Please notice this sentence in the previous paragraph in the toturial :
"Behind the scenes, the for statement calls iter() on the container object. The function returns an iterator object that defines the method next() which accesses elements in the container one at a time."
I think it's clear enough that the container object is different from the iterator object, what exactly has the iterator behavior is the iterator object, not the container object. And this sentence you mentioned:
"If the class defines next(), then __iter__() can just return self:"
is to teach you define a iterator, not a container. So by reading it carefully, you won't define something that is both a container and a iterator, will you?
If you really define a next() on a class which you want it be a container, and then define a __iter__() on it and return self, then your container is exactly a generator, not a container. And a generator can certainly iterate only once.
Hello
I thought about two bits in your message: “I think it's clear enough” and “So by reading it carefully”. They show that the doc is a bit ambiguous here, and while I agree that people should take the time to read and understand, I think that documentation can always be made clearer. The doc about iterables could tell to return an instance of another class or use a generator, and then a separate paragraph would explain about using next to define iterators, explaining that iterators may or may not be iterable.
Does that sound good?
Regards
It might be worth mentioning in the tutorial the specific problem the OP points out: that if an object returns itself in __iter, then that object can have only one iteration state. And that therefor only an object that is itself a unique iterator (as opposed to a collection wanting to allow iteration) should return itself as __iter.
Also note that in the example in the tutorial it says "easy to add the behavior", which *in context* is wrong; pedagogically the better thing to do would be to say "it is easy to write a class that implements an iterator". As Éric said, splitting the text so that returning an iterator from a collection and constructing an iterator class are distinct topics would greatly improve the exposition.
I'm an experienced Python coder, and I've made this mistake myself. Often it doesn't break because the code only iterates the object once, but it would be better to have the tutorial encourage doing this correctly by providing good examples and explanations.
I think the docs are fine as-is. At that point in the tutorial, we're trying to teach what the iterator protocol is.
The considerations of what makes a good design can be discussed elsewhere. It is not a one-liner. For example, files are self iterators but lists are not -- there are good reasons for either choice. Explaining those design choices is not a good way to *introduce* the iterator protocol.
Recommend closing this report.
As an experienced Python programmer I am obviously aware that the tutorial is trying to teach how to make an iterator class, not how to make a container class.
But the tutorial doesn’t make that *clear*. It should be much more explicit about what it is explaining to avoid confusing those concepts in the minds of beginners. (Or even the staff of the MIT introductory CS course.)
One way to fix this confusion would be to explain what one should do for both container classes and iterator classes:
""" Having seen the mechanics behind the iterator protocol, it is easy to add iterator behavior to your container classes. Define a :meth:`__iter` method which returns an object of a separate iterator class. The iterator class should have a :meth:`next` method and an :meth:`__iter` method (the :meth:`__iter__` method of the iterator class should just return ``self``)::
class ReverseList(object):
"Container that lets you iterate over the items backwards"
def __init__(self, data):
self.data = data
def __iter__(self):
return ReverseIterator(self.data)
class ReverseIterator(object):
"Iterator for looping over a sequence backwards"
def __init__(self, data):
self.data = data
self.index = len(data)
def __iter__(self):
return self
def next(self):
if self.index == 0:
raise StopIteration
self.index = self.index - 1
return self.data[self.index]
>>> for char in ReverseIterator('spam'):
... print char
...
m
a
p
s
>>> for char in ReverseList([1,2,3]):
... print char
...
3
2
1
"""
The two example classes are used exactly the same way, and don't show the differences between them. How about a simple change like:
Replace: """If the class defines :meth:`next`, then :meth:`__iter` can just return ``self``""" with """It may be convenient to define :meth:`next` in the class, and have :meth:`__iter` just return ``self``; In this case though, the object can be iterated only once."""
I don’t think that small change is good enough, if it is still the case that the only provided example is the dangerous one.
It would be easy to clarify the differences between the classes:
>>> rl = test.ReverseList('spam')
>>> [c for c in rl]
['m', 'a', 'p', 's']
>>> [c for c in rl]
['m', 'a', 'p', 's']
>>> ri = iter(rl)
>>> ri
<test.ReverseIterator object at 0x7fa5cbeaec50>
>>> [c for c in ri]
['m', 'a', 'p', 's']
>>> [c for c in ri]
[]
At least equally useful would be the mention that __iter__ can be a generator, eliminating the need for intermediate objects:
>>> class C:
... def __iter__(self):
... yield 1
... yield 2
...
>>> list(C())
[1, 2]
Why do you call this "dangerous"? because the object can be iterated only once? But reversed() does the same thing. And all iterators objects must also implement the __iter__ method that return themselves, otherwise they cannot be used in a for loop. Better start with such an object.
Antoine: That’s true.
Amaury: See my original bug description (“This is reasonable advice for writing an iterator class, but terrible advice for writing a container class…”), and my other comments.
There is nothing wrong with explaining how to write an iterator, but the explanation needs to make clear that this is _not_ how you write a container. Currently the section opens with a misleading motivation (“By now you have probably noticed that most container objects can be looped over using a for statement”), but it does actually not explain how to write a container at all. So I proposed some language and an example to fix that.
My proposal for this documentation point is to get rid off the word "most" and to replace it with "built-in". Since it will remove the misleading idea that this paragraph can explain how you can write a containers.
Usui, this is a tutorial intended for beginners. Even if the change from “most” to “built-in” were a relevant one (and I don’t see how it is), beginners cannot possibly be expected to parse that level of meaning out of a single word.
The difference between iterators and containers deserves at least a couple of sentences and preferably an example that includes both, as I proposed in http://bugs.python.org/issue8376#msg102966. Do you disapprove of that proposal?
I don't disapprove the proposal on the need to have an explanation on the difference between a container and an iterator.
But I was unable to found in the documentation a proper definition of a container.
A new proposal is to add at the end of the paragraph on the iterator those point: """ As you should have noticed this Reverse let you iterate over the data only once::
>>> rev = Reverse('spam')
>>> for char in rev:
... print(char)
...
m
a
p
s
>>> for char in rev:
... print(char)
...
>>>
So now let's complete our example to have a container that allow you to iterate over his items backwards::
class ReverseList(object):
"Container that lets you iterate over the items backwards"
def __init__(self, data):
self.data = data
def __iter__(self):
return ReverseIterator(self.data)
class ReverseIterator(object):
"Iterator for looping over a sequence backwards"
def __init__(self, data):
self.data = data
self.index = len(data)
def __iter__(self):
return self
def next(self):
if self.index == 0:
raise StopIteration
self.index = self.index - 1
return self.data[self.index]
>>> rev = ReverseList([1,2,3])
>>> for char in rev:
... print (char)
...
3
2
1
>>> for char in rev:
... print (char)
...
3
2
1
"""
I just had a colleague get confused about the container returning self, and he was referring to the iterator protocol docs at https://docs.python.org/3.6/library/stdtypes.html#iterator.\_\_iter__. If you don't read that section with your thinking cap firmly in place it is easy to get confused about what "the iterator itself" means there. I think it would be beneficial to add a sentence about the iterator state needing to be independent for each iterator returned by the *container's* __iter__. A similar sentence in the tutorial might be all that is needed as well.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = None closed_at = None created_at =
labels = ['type-bug', '3.7', 'docs']
title = 'Tutorial offers dangerous advice about iterators: \xe2\x80\x9c__iter__() can just return self\xe2\x80\x9d'
updated_at =
user = 'https://github.com/andersk'
```
bugs.python.org fields:
```python
activity =
actor = 'r.david.murray'
assignee = 'docs@python'
closed = False
closed_date = None
closer = None
components = ['Documentation']
creation =
creator = 'andersk'
dependencies = []
files = ['45088']
hgrepos = []
issue_num = 8376
keywords = ['patch']
message_count = 16.0
messages = ['102918', '102951', '102955', '102961', '102963', '102966', '111155', '111196', '111197', '111198', '111201', '278638', '278640', '278641', '278805', '298280']
nosy_count = 10.0
nosy_names = ['georg.brandl', 'rhettinger', 'terry.reedy', 'amaury.forgeotdarc', 'pitrou', 'eric.araujo', 'r.david.murray', 'ysj.ray', 'andersk', 'Usui']
pr_nums = []
priority = 'low'
resolution = None
stage = None
status = 'open'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue8376'
versions = ['Python 3.6', 'Python 3.7']
```