python / cpython

The Python programming language
https://www.python.org
Other
63.11k stars 30.22k forks source link

Library function to run an iterator to the end #122803

Closed mrolle45 closed 2 months ago

mrolle45 commented 2 months ago

Feature or enhancement

Proposal:

Built-in functions to run an iterator, or apply a function to an iterator (like map()). In case the iterator might be infinite, provide an extra keyword to limit the number of times it is called.

Of course, I could write my own code to do this. However, having it in the library has two big advantages:

  1. The programmer doesn't have to figure out how to accomplish it.
  2. As a built-in function, it would be much faster -- no repeated calls to the user code in the interpreter.

As an example, I want to print each of the elements of a list or other iterable. I would write

any(map(print, iterable))

This means knowing that print() always returns None, so calling any() calls print() for each element. It doesn't have a loop in the interpreter, as all the functions are built in. But it's messy and non-intuitive. It would be better to just write runmap(print, iterable).

If I wanted to map some other function, which might return either a true or a false value, I would have to write something like: all(filter(lambda: True, map(somefunc, iterable))). This is even more unintuitive, and slower because the interpreter calls the lambda many times.

Details

Add functions to itertools like the following:

def run(iterable, max: int = None) -> None:
    if max is None:
        while _ in iterable: pass
    else:
        while i, _ in enumerate(iterable, 1):
            if i >= max: break # stop after (max) iterations.

def runmap(iterable, func, max: int = None) -> None:
    for obj in run(iterable, max):
        func(obj)

Of course, this would be written in C as part of the built-in itertools module. You may think of better names for these functions.

Has this already been discussed elsewhere?

Don't know

ZeroIntensity commented 2 months ago

Certainly needs discussion on discourse, but from what I've seen, there has been a lot of hesitation to add things to itertools recently.

Regarding these points:

  • The programmer doesn't have to figure out how to accomplish it.
  • As a built-in function, it would be much faster -- no repeated calls to the user code in the interpreter.

This, generally, won't make it past discourse either, because this applies to nearly all of the (non-syntax related) proposals on there.

brianschubert commented 2 months ago

Related discussion for a similar set of itertools additions: https://discuss.python.org/t/add-itertools-functions-for-ugly-idioms/56950

As mentioned in that thread, your run function already exists as more_itertools.consume.

Before opening a new thread on Discourse, I’d recommend going through some of the previous itertools proposals to get a sense for what sort of concerns have been expressed in the past (and make sure your proposal addresses some of the key ones).

ZeroIntensity commented 2 months ago

I'm gonna ping @terryjreedy, as he had strong opinions about the previous itertools issue.

sobolevn commented 2 months ago

Why won't this work?

while True:
    try:
        item = next(some_iterator)
    except StopIteration:
        break
    some_function(item)  # whatever else you need

it take 6 lines of code, it's very flexible, simple, clear, and works for any python version.

terryjreedy commented 2 months ago

My itertools opinions are based on the design of Python in general and itertools in particular. The itertools intention is described in the opening paragraphs of it doc, in particular, "The module standardizes a core set of fast, memory efficient tools that are useful by themselves or in combination." The collection has already grown to 21, which is already a lot to consider. We leave combinations of itertools to more_itertools, which is free to be a miscellaneous grab bag of anything and everything, with whatever policy of additions, removals, compatibility, and release schedule. I presume it include all the recipes in the itertools doc and a lot more.

For this issue, notice "in combination". itertools.islice can be used, among other things, to limit iteration of every iterator up to a limit.

>>> import itertools as it
>>> list(it.islice(it.count(), 10))  # Limit infinite iteration.
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(it.islice(range(3), 10))  # Accept less than limit.
[0, 1, 2]

The OP's run combines two separate ideas: 1) limit iteration by passing max; 2) consume and ignore all remaining items. The first goal can already be done with islice. No need to pass max around. The second is only applicable when iteration has side-effects, which can include making next calls elsewhere fail. This should be rare as iteration only for side-effect is not an intended usage of the machinery. If needed anyway, while _ in someiter: pass, used in the proposed run, is trivial.

>>> r = iter(range(3))
>>> while _ in r: pass
... 
>>> next(r)
Traceback (most recent call last):
  File "<pyshell#13>", line 1, in <module>
    next(r)
StopIteration

If side-effects are not needed, one can just discard (or ignore) the iterator. This takes no time.

Note that using run properly still requires the programmer to consider the possibility that unknown iterator might be infinite. I'm closing because I do not see this issue going anywhere.

graingert commented 2 months ago

It's pretty common to use collections.deque(iterable, 0) for this