python / cpython

The Python programming language
https://www.python.org
Other
63.15k stars 30.24k forks source link

[doc] map() documentation ambiguous about consumption order #90014

Closed 7b8fe85a-0250-471b-b86c-93b3289a47c3 closed 2 years ago

7b8fe85a-0250-471b-b86c-93b3289a47c3 commented 2 years ago
BPO 45856
Nosy @rhettinger, @ezio-melotti, @merwok, @willingc, @JulienPalard, @Thibauth

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['3.7', '3.8', '3.9', '3.10', '3.11', 'type-feature', 'docs'] title = '[doc] map() documentation ambiguous about consumption order' updated_at = user = 'https://github.com/Thibauth' ``` bugs.python.org fields: ```python activity = actor = 'thibaut' assignee = 'docs@python' closed = True closed_date = closer = 'rhettinger' components = ['Documentation'] creation = creator = 'thibaut' dependencies = [] files = [] hgrepos = [] issue_num = 45856 keywords = [] message_count = 3.0 messages = ['406693', '406697', '406699'] nosy_count = 7.0 nosy_names = ['rhettinger', 'ezio.melotti', 'eric.araujo', 'docs@python', 'willingc', 'mdk', 'thibaut'] pr_nums = [] priority = 'normal' resolution = 'rejected' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue45856' versions = ['Python 3.6', 'Python 3.7', 'Python 3.8', 'Python 3.9', 'Python 3.10', 'Python 3.11'] ```

7b8fe85a-0250-471b-b86c-93b3289a47c3 commented 2 years ago

In cases where multiple iterables are passed to the built-in function map(), the documentation is ambiguous about the order in which they are consumed [1]. Although the order of evaluation of function arguments is documented to be left-to-right in general [2], this does not necessarily imply that the __next__() functions of the underlying iterators are called from left to right *before* passing the returned elements to the function being mapped. This is particularly relevant when the same iterator is passed multiple times, or when there are side effects to consuming the iterables.

I suggest adding the sentence “The iterables are consumed in left-to-right order at each iteration.”, similar to how it is done for the function zip() [3]. Furthermore, I think providing the following (roughly) equivalent implementation in pure Python might be illuminating:

    def map(function, *iterables):
        iterators = tuple(iter(it) for it in iterables)
        while True:
            try:
                args = [next(it) for it in iterators]
            except StopIteration:
                break
            yield func(*args)

Finally, the following example could be added. “This makes it possible to apply a function to consecutive groups of elements from the same iterator by passing it multiple times to map:

    from itertools import count

    ctr = count()
    # map(func, ctr, ctr) -> func(0, 1), func(2, 3), ...

”

I am happy to submit a pull request once we reach a consensus on the formulation.

[1] https://docs.python.org/3/library/functions.html#map [2] https://docs.python.org/3/reference/expressions.html#evaluation-order [3] https://docs.python.org/3/library/functions.html#zip

rhettinger commented 2 years ago

I don't think this suggestion is helpful or necessary. The map() docs have been around for a long time and this hasn't proven to be a point of confusion.

The itertools docs already have a recipe demonstrating the technique of passing the same iterator multiple times with izip_longest(). That is a case where the technique is useful. In the context of map() however this technique is rarely, if ever, used.

A last thought is that we do put in rough pure python equivalents when they help understand the function. In this case though, the pure python code provided is likely only intelligible to someone who already understands map().

Thank you for the suggestion, but we should pass on this one.

7b8fe85a-0250-471b-b86c-93b3289a47c3 commented 2 years ago

this hasn't proven to be a point of confusion

Absence of evidence is not evidence of absence… The word "confusion" is probably a bit strong, but I recently had to write code relying on this behavior and found myself checking the documentation to make sure the code would behave as I expected. Not being able to find it explained in the documentation, I felt uncomfortable relying on an implicit behavior. After all, doesn't the PEP-20 state that "Explicit is better than implicit"?

In the context of map() however this technique is rarely, if ever, used.

I think use cases are more common than what might appear at first glance. For example, a file could contain information about a list of entities, each entity being described over two consecutive lines of the file (this is admittedly a bad format for a file, but such is the reality of real-world data…). Then, given a function parse constructing an internal representation of the entity from two lines, the list of entities can elegantly be constructed using map(parse, file, file). The equivalent construction with zip would be something like starmap(parse, zip(file, file)) which is unnecessary convoluted.

the pure python code provided is likely only intelligible to someone who already understands map()

The pure Python code expresses map in terms of other language constructs, so it is not clear to me why one would need to already understand map() to understand the provided code. This is similar to a dictionary definition, where a concept is explained in terms of other concepts. This is also similar to how related functions (like starmap) are explained in the itertools module.

Overall, I am curious about the rationale behind not making the documentation more explicit when possible.