python / cpython

The Python programming language
https://www.python.org
Other
63.15k stars 30.24k forks source link

re-usable generators / generator expressions should return iterables #50223

Closed 24b92cc0-684b-470b-a9f0-ccf505b0f114 closed 15 years ago

24b92cc0-684b-470b-a9f0-ccf505b0f114 commented 15 years ago
BPO 5973
Nosy @bitdancer
Files
  • reusable_generators.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-feature'] title = 're-usable generators / generator expressions should return iterables' updated_at = user = 'https://bugs.python.org/svenrahmann' ``` bugs.python.org fields: ```python activity = actor = 'Jae' assignee = 'none' closed = True closed_date = closer = 'r.david.murray' components = [] creation = creator = 'svenrahmann' dependencies = [] files = ['13936'] hgrepos = [] issue_num = 5973 keywords = [] message_count = 3.0 messages = ['87473', '87503', '89898'] nosy_count = 3.0 nosy_names = ['r.david.murray', 'svenrahmann', 'Jae'] pr_nums = [] priority = 'normal' resolution = 'rejected' stage = None status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue5973' versions = [] ```

    24b92cc0-684b-470b-a9f0-ccf505b0f114 commented 15 years ago

    The syntax of generator expressions suggests that they can be used similarly to lists (at least when iterated over). However, as was pointed out to me, the resulting generators are iterators and can be used only once. This is inconvenient in situations where some function expects an iterable argument but needs to iterate over it more than once.

    Consider the following function (see also attached file reusable_generators.py for a complete example)

    def secondmax(iterable):
        """return the second largest value in iterable"""
        m = max(iterable)
        return max(i for i in iterable if i<m)

    It works fine when passed a list or other iterable container, but consider the following situation. We have a huge matrix A (list of lists) and want to pass a column to the function.

    Using a list works fine, but requires copying the column's values and needs additional memory:

    col2_list = [a[2] for a in A]  # new list created from column 2

    There is no reason why we shouldn't be able to create an iterable object that returns, one by one, the values from the colums:

    col2_gen  = (a[2] for a in A) 

    The problem is that secondmax(col2_gen) does not work; try the attached file: col2_gen can be iterated over only once.

    I can imagine many situations where I need or want to iterate over such a "view" object several times; I don't see a reason why it shouldn't be possible or why it would be unwanted.

    We can do the following, but it is not elegant: Wrap the generator expression into a closure and a class.

    class ReusableGenerator():
        def __init__(self,g): self.g = g
        def __iter__(self):   return self.g()
    
    col2_re = ReusableGenerator(lambda: (a[2] for a in A)) # I want this!

    This works, but it is not a generator object (e.g., it doesn't have a next method). We also need the lambda detour for this to work.

    Note that in some situations, the "problem" I describe does not occur or can be easily circumvented. For example instead of writing

    col2 = (a[2] for a in A) 
    for x in col2: foo(x)
    for x in col2: foo(x) # doesn't work

    we could just repeat the generator expression (and create a new iterator whenever we need it):

    for x in (a[2] for a in A): foo(x)
    for x in (a[2] for a in A): foo(x) # works fine

    But exactly this is impossible if I want to pass the generator expression or generator function to another function (such as secondmax()).

    I believe this contradicts Python philosophy that functions can be passed around just like other objects.

    My proposal is probably unrealistic, but I would like to see generator functions and generator expressions change in a way that they return not iterators, but iterables, so the problem described here does not occur, and wrapper classes are unnecessary.

    In Java that distinction is very clear, in Python less so I think (which is good because iterators are a pain to use in Java).

    Admittedly, I have no idea why generator functions and expressions are implemented as they are; there are probably lots of good reasons, and it may not be possible to change this any time soon or at all. However, I think the change would make Python a more consistent language.

    bitdancer commented 15 years ago

    You might be interested to read about this package:

    http://www.fiber-space.de/generator_tools/doc/generator_tools.html

    For anything to happen in this area you'd need to get some consensus on python-ideas first. If you do that, you can open a new ticket referencing the python-ideas thread (or even reopen this one if that seems appropriate).

    b0034086-bf0d-4b1e-948b-ef36127fe080 commented 15 years ago

    I second this feature request, and will try to get consensus in python-ideas.

    Meanwhile, here's a sample workaround.

    >>> def gen2iterable(genfunc):
    ...     def wrapper(*args, **kwargs):
    ...         class _iterable(object):
    ...             def __iter__(self):
    ...                 return genfunc(*args, **kwargs)
    ...         return _iterable()
    ...     return wrapper
    ... 
    >>> 
    >>> @gen2iterable
    ... def foo():
    ...   for i in range(10):
    ...     yield i
    ... 
    >>> a = foo()
    >>> 
    >>> max(a)
    9
    >>> max(a)
    9
    >>> def secondmax(iterable):
    ...     """return the second largest value in iterable"""
    ...     m = max(iterable)
    ...     return max(i for i in iterable if i<m)
    ... 
    >>> secondmax(a)
    8