python / cpython

The Python programming language
https://www.python.org
Other
63.03k stars 30.18k forks source link

Tutorial offers dangerous advice about iterators: “__iter__() can just return self” #52623

Open d23c70bd-a0ee-4d72-b476-0a282f132652 opened 14 years ago

d23c70bd-a0ee-4d72-b476-0a282f132652 commented 14 years ago
BPO 8376
Nosy @birkenfeld, @rhettinger, @terryjreedy, @amauryfa, @pitrou, @merwok, @bitdancer, @andersk, @Usui22750
Files
  • issue8376.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['type-bug', '3.7', 'docs'] title = 'Tutorial offers dangerous advice about iterators: \xe2\x80\x9c__iter__() can just return self\xe2\x80\x9d' updated_at = user = 'https://github.com/andersk' ``` bugs.python.org fields: ```python activity = actor = 'r.david.murray' assignee = 'docs@python' closed = False closed_date = None closer = None components = ['Documentation'] creation = creator = 'andersk' dependencies = [] files = ['45088'] hgrepos = [] issue_num = 8376 keywords = ['patch'] message_count = 16.0 messages = ['102918', '102951', '102955', '102961', '102963', '102966', '111155', '111196', '111197', '111198', '111201', '278638', '278640', '278641', '278805', '298280'] nosy_count = 10.0 nosy_names = ['georg.brandl', 'rhettinger', 'terry.reedy', 'amaury.forgeotdarc', 'pitrou', 'eric.araujo', 'r.david.murray', 'ysj.ray', 'andersk', 'Usui'] pr_nums = [] priority = 'low' resolution = None stage = None status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue8376' versions = ['Python 3.6', 'Python 3.7'] ```

    d23c70bd-a0ee-4d72-b476-0a282f132652 commented 14 years ago

    The Python tutorial offers some dangerous advice about adding iterator behavior to a class: http://docs.python.org/tutorial/classes.html#iterators “By now you have probably noticed that most container objects can be looped over using a for statement: … Having seen the mechanics behind the iterator protocol, it is easy to add iterator behavior to your classes. Define a __iter() method which returns an object with a next() method. If the class defines next(), then __iter() can just return self:”

    This is reasonable advice for writing an iterator class, but terrible advice for writing a container class, because it encourages you to associate a single iterator with the container, which breaks nested iteration and leads to hard-to-find bugs. (One of those bugs recently made its way into the code handout for a problem set in MIT’s introductory CS course, 6.00.)

    A container class’s __iter__() should return a generator or an instance of a separate iterator class, not self. The tutorial should make this clearer.

    09aa2b3e-3f5a-4c34-93c0-4a96363de691 commented 14 years ago

    I think there is not much ploblem. Please notice this sentence in the previous paragraph in the toturial :

    "Behind the scenes, the for statement calls iter() on the container object. The function returns an iterator object that defines the method next() which accesses elements in the container one at a time."

    I think it's clear enough that the container object is different from the iterator object, what exactly has the iterator behavior is the iterator object, not the container object. And this sentence you mentioned:

    "If the class defines next(), then __iter__() can just return self:"

    is to teach you define a iterator, not a container. So by reading it carefully, you won't define something that is both a container and a iterator, will you?

    If you really define a next() on a class which you want it be a container, and then define a __iter__() on it and return self, then your container is exactly a generator, not a container. And a generator can certainly iterate only once.

    merwok commented 14 years ago

    Hello

    I thought about two bits in your message: “I think it's clear enough” and “So by reading it carefully”. They show that the doc is a bit ambiguous here, and while I agree that people should take the time to read and understand, I think that documentation can always be made clearer. The doc about iterables could tell to return an instance of another class or use a generator, and then a separate paragraph would explain about using next to define iterators, explaining that iterators may or may not be iterable.

    Does that sound good?

    Regards

    bitdancer commented 14 years ago

    It might be worth mentioning in the tutorial the specific problem the OP points out: that if an object returns itself in __iter, then that object can have only one iteration state. And that therefor only an object that is itself a unique iterator (as opposed to a collection wanting to allow iteration) should return itself as __iter.

    Also note that in the example in the tutorial it says "easy to add the behavior", which *in context* is wrong; pedagogically the better thing to do would be to say "it is easy to write a class that implements an iterator". As Éric said, splitting the text so that returning an iterator from a collection and constructing an iterator class are distinct topics would greatly improve the exposition.

    I'm an experienced Python coder, and I've made this mistake myself. Often it doesn't break because the code only iterates the object once, but it would be better to have the tutorial encourage doing this correctly by providing good examples and explanations.

    rhettinger commented 14 years ago

    I think the docs are fine as-is. At that point in the tutorial, we're trying to teach what the iterator protocol is.

    The considerations of what makes a good design can be discussed elsewhere. It is not a one-liner. For example, files are self iterators but lists are not -- there are good reasons for either choice. Explaining those design choices is not a good way to *introduce* the iterator protocol.

    Recommend closing this report.

    d23c70bd-a0ee-4d72-b476-0a282f132652 commented 14 years ago

    As an experienced Python programmer I am obviously aware that the tutorial is trying to teach how to make an iterator class, not how to make a container class.

    But the tutorial doesn’t make that *clear*. It should be much more explicit about what it is explaining to avoid confusing those concepts in the minds of beginners. (Or even the staff of the MIT introductory CS course.)

    One way to fix this confusion would be to explain what one should do for both container classes and iterator classes:

    """ Having seen the mechanics behind the iterator protocol, it is easy to add iterator behavior to your container classes. Define a :meth:`__iter` method which returns an object of a separate iterator class. The iterator class should have a :meth:`next` method and an :meth:`__iter` method (the :meth:`__iter__` method of the iterator class should just return ``self``)::

       class ReverseList(object):
           "Container that lets you iterate over the items backwards"
           def __init__(self, data):
               self.data = data
           def __iter__(self):
               return ReverseIterator(self.data)
    
       class ReverseIterator(object):
           "Iterator for looping over a sequence backwards"
           def __init__(self, data):
               self.data = data
               self.index = len(data)
           def __iter__(self):
               return self
           def next(self):
               if self.index == 0:
                   raise StopIteration
               self.index = self.index - 1
               return self.data[self.index]
       >>> for char in ReverseIterator('spam'):
       ...     print char
       ...
       m
       a
       p
       s
       >>> for char in ReverseList([1,2,3]):
       ...     print char
       ...
       3
       2
       1
    """
    amauryfa commented 14 years ago

    The two example classes are used exactly the same way, and don't show the differences between them. How about a simple change like:

    Replace: """If the class defines :meth:`next`, then :meth:`__iter` can just return ``self``""" with """It may be convenient to define :meth:`next` in the class, and have :meth:`__iter` just return ``self``; In this case though, the object can be iterated only once."""

    d23c70bd-a0ee-4d72-b476-0a282f132652 commented 14 years ago

    I don’t think that small change is good enough, if it is still the case that the only provided example is the dangerous one.

    It would be easy to clarify the differences between the classes:

    >>> rl = test.ReverseList('spam')
    >>> [c for c in rl]
    ['m', 'a', 'p', 's']
    >>> [c for c in rl]
    ['m', 'a', 'p', 's']
    >>> ri = iter(rl)
    >>> ri
    <test.ReverseIterator object at 0x7fa5cbeaec50>
    >>> [c for c in ri]
    ['m', 'a', 'p', 's']
    >>> [c for c in ri]
    []
    pitrou commented 14 years ago

    At least equally useful would be the mention that __iter__ can be a generator, eliminating the need for intermediate objects:

    >>> class C:
    ...     def __iter__(self):
    ...         yield 1
    ...         yield 2
    ... 
    >>> list(C())
    [1, 2]
    amauryfa commented 14 years ago

    Why do you call this "dangerous"? because the object can be iterated only once? But reversed() does the same thing. And all iterators objects must also implement the __iter__ method that return themselves, otherwise they cannot be used in a for loop. Better start with such an object.

    d23c70bd-a0ee-4d72-b476-0a282f132652 commented 14 years ago

    Antoine: That’s true.

    Amaury: See my original bug description (“This is reasonable advice for writing an iterator class, but terrible advice for writing a container class…”), and my other comments.

    There is nothing wrong with explaining how to write an iterator, but the explanation needs to make clear that this is _not_ how you write a container. Currently the section opens with a misleading motivation (“By now you have probably noticed that most container objects can be looped over using a for statement”), but it does actually not explain how to write a container at all. So I proposed some language and an example to fix that.

    30cc03b7-deb1-41a9-91b1-453176bce693 commented 8 years ago

    My proposal for this documentation point is to get rid off the word "most" and to replace it with "built-in". Since it will remove the misleading idea that this paragraph can explain how you can write a containers.

    d23c70bd-a0ee-4d72-b476-0a282f132652 commented 8 years ago

    Usui, this is a tutorial intended for beginners. Even if the change from “most” to “built-in” were a relevant one (and I don’t see how it is), beginners cannot possibly be expected to parse that level of meaning out of a single word.

    The difference between iterators and containers deserves at least a couple of sentences and preferably an example that includes both, as I proposed in http://bugs.python.org/issue8376#msg102966. Do you disapprove of that proposal?

    30cc03b7-deb1-41a9-91b1-453176bce693 commented 8 years ago

    I don't disapprove the proposal on the need to have an explanation on the difference between a container and an iterator.

    But I was unable to found in the documentation a proper definition of a container.

    30cc03b7-deb1-41a9-91b1-453176bce693 commented 8 years ago

    A new proposal is to add at the end of the paragraph on the iterator those point: """ As you should have noticed this Reverse let you iterate over the data only once::

       >>> rev = Reverse('spam')
       >>> for char in rev:
       ...     print(char)
       ...
       m
       a
       p
       s
       >>> for char in rev:
       ...     print(char)
       ...
       >>>

    So now let's complete our example to have a container that allow you to iterate over his items backwards::

      class ReverseList(object):
           "Container that lets you iterate over the items backwards"
           def __init__(self, data):
               self.data = data
           def __iter__(self):
               return ReverseIterator(self.data)
    
       class ReverseIterator(object):
           "Iterator for looping over a sequence backwards"
           def __init__(self, data):
               self.data = data
               self.index = len(data)
           def __iter__(self):
               return self
           def next(self):
               if self.index == 0:
                   raise StopIteration
               self.index = self.index - 1
               return self.data[self.index]
       >>> rev = ReverseList([1,2,3])
       >>> for char in rev:
       ...     print (char)
       ...
       3
       2
       1
       >>> for char in rev:
       ...     print (char)
       ...
       3
       2
       1
    """
    bitdancer commented 7 years ago

    I just had a colleague get confused about the container returning self, and he was referring to the iterator protocol docs at https://docs.python.org/3.6/library/stdtypes.html#iterator.\_\_iter__. If you don't read that section with your thinking cap firmly in place it is easy to get confused about what "the iterator itself" means there. I think it would be beneficial to add a sentence about the iterator state needing to be independent for each iterator returned by the *container's* __iter__. A similar sentence in the tutorial might be all that is needed as well.