python / cpython

The Python programming language
https://www.python.org
Other
61.97k stars 29.8k forks source link

Add a default argument to min & max #62311

Closed 8a0c69e7-f5ea-4ea3-8880-4903ed9740d7 closed 11 years ago

8a0c69e7-f5ea-4ea3-8880-4903ed9740d7 commented 11 years ago
BPO 18111
Nosy @gvanrossum, @Yhg1s, @rhettinger, @ncoghlan, @nedbat, @merwok, @dhellmann, @bitdancer, @skrah, @Julian, @phmc
Files
  • minmaxdefault.patch
  • minmax.patch
  • minmax.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/rhettinger' closed_at = created_at = labels = ['interpreter-core', 'type-feature'] title = 'Add a default argument to min & max' updated_at = user = 'https://github.com/Julian' ``` bugs.python.org fields: ```python activity = actor = 'eric.araujo' assignee = 'rhettinger' closed = True closed_date = closer = 'rhettinger' components = ['Interpreter Core'] creation = creator = 'Julian' dependencies = [] files = ['30437', '30479', '30510'] hgrepos = [] issue_num = 18111 keywords = ['patch'] message_count = 35.0 messages = ['190426', '190432', '190464', '190465', '190466', '190469', '190470', '190477', '190481', '190482', '190484', '190514', '190515', '190538', '190542', '190545', '190548', '190569', '190571', '190575', '190577', '190579', '190580', '190582', '190584', '190585', '190586', '190595', '190703', '190839', '191239', '191836', '192003', '192006', '211158'] nosy_count = 13.0 nosy_names = ['gvanrossum', 'twouters', 'rhettinger', 'ncoghlan', 'nedbat', 'eric.araujo', 'doughellmann', 'r.david.murray', 'skrah', 'dabeaz', 'Julian', 'python-dev', 'pconnell'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'patch review' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue18111' versions = ['Python 3.4'] ```

    8a0c69e7-f5ea-4ea3-8880-4903ed9740d7 commented 11 years ago

    This has come up a number of times I'm aware, but no one has ever written a patch for it as far as a quick look uncovered.

    So here's one (written by Thomas Wouters but with permission to submit). Submitting without tests and docs, but those are incoming when I get a moment.

    The justification here is mostly related to being able to call min/max on an iterable of unknown length when there's a sensible default (which is usually going to be None, but that's not the default for backwards compat obviously).

    bitdancer commented 11 years ago

    bpo-7153 and the discussions linked therefrom is presumably relevant here.

    Do you have a concrete use case?

    8a0c69e7-f5ea-4ea3-8880-4903ed9740d7 commented 11 years ago

    Thanks for finding that, I thought there was an issue that came out of that p-i thread but couldn't find it.

    I'd like to be more concrete, but "calling max on an iterable" seems concrete enough to me. If you'd like to know more though, personally I've wanted this at least twice in the past 4 or 5 months. Currently, I have code that looks like:

        def best_match(stuff):
            first = next(stuff, None)
            if first is None:
                return
            return max(itertools.chain([first], stuff))

    which finds the best matching (error it happens to be) in the given stuff. A few months ago I had a similar need in a different application.

    The issues in that thread from 2009 revolved around a bunch of confusing and not so related things. And I definitely agree that start is a really bad name for this. But I don't find default to be at all confusing, and in fact this has come up in #python a few times and each time there hasn't really been a problem explaining to someone what default would do (or how it would interact with key for that matter, although if a precedent is desired, the default in argparse just returns the default, it doesn't call type or anything on it).

    bitdancer commented 11 years ago

    So you aren't really asking for a default, you are asking for a version of max/min that returns a sentinel instead of raising an error on an empty list. You could just write utility function that has a try/except in it. I'm not sure it is worth complicating min and max for this use case. (I'm not sure it isn't, either.) I wonder if we have any other functions in python that have an argument that turns a ValueError into a sentinel-return instead?

    8a0c69e7-f5ea-4ea3-8880-4903ed9740d7 commented 11 years ago

    Yes that's a good description. I'm not sure the type of exception is the thing to look at as much as the behavior, I.e. I think next() is a good precedent.

    And it's not really pleasant to do this with a try/except. Of course everything's possible, but to catch the ValueError sanely requires checking the text of the exception so that the except isn't too broad.

    bitdancer commented 11 years ago

    I don't think there's any other way to get a ValueError out of min/max, but I haven't actually looked at the code to check. Of course, if we want people to rely on that, we'd need to document it.

    'next's default is used to return a sentinel when the list is exhausted, not when it would otherwise raise a ValueError. It is a somewhat related case, but is not exactly parallel. The sentinel for next indicates the end of an ongoing process. The proposed argument to min/max would *replace* an error with a sentinel value.

    8a0c69e7-f5ea-4ea3-8880-4903ed9740d7 commented 11 years ago

    It's not exactly the same of course, but calling next on a thing that might be empty would be somewhat close, and also is replacing an exception with a sentinel (an exception that is much easier to differentiate).

    You can always get a ValueError out of min/max, they're going to be ultimately calling __lt__ on stuff which can do what it wants, but that's admittedly quite unlikely.

    It's not really that readable though on the other hand.

    rhettinger commented 11 years ago

    I'm -1 on expanding this API further. It already is pushing the limits with the dual signature and with the key-function.

    Many languages have min/max functions. AFAICT, none of them have an API with a default argument. This suggests that this isn't an essential capability.

    bitdancer commented 11 years ago

    That's a good point about the __lt__. It occurred to me as well just before I read your post :).

    Raymond, do any other languages have an iterator protocol as a core language feature? It's the fact that it is in Python, and that it is not simple to LBYL when dealing with iterators, that brings this issue up for min and max.

    bitdancer commented 11 years ago

    Oh, and I don't think Haskell counts, since you'd expect them to stick strictly to the mathematical definition, with no consideration of practicality :)

    Note that I'm not saying I'm +1 on adding this (I haven't decided), I'm just trying to clarify the argument.

    5531d0d8-2a9c-46ba-8b8b-ef76132a492c commented 11 years ago

    I'd use foldl() in functional languages, where the default is part of foldl() and not of max().

    Translated to Python, I'm thinking of:

    it = iter([328, 28, 2989, 22])
    functools.reduce(max, it, next(it, None))
    2989

    I agree with Raymond that a default arg in max() looks out of place.

    rhettinger commented 11 years ago

    Thanks Stephan. I'm going to close this one. The case for adding it is too weak and isn't worth making the API more complex.

    If someone wants a default with an iterable of arbitrary size including zero, there are already a number of ways to do it (using itertools.chain for example or just catching the exception).

    8a0c69e7-f5ea-4ea3-8880-4903ed9740d7 commented 11 years ago

    I don't really care to push this much harder, but I'll just repeat that I've already made an argument against catching the exception. Calling this making the API too complex also seems quite silly to me. It's a thing that someone looking for would find and someone who wasn't wouldn't.

    Yhg1s commented 11 years ago

    For the record, Raymond, I think you're wrong about this. Itertools isn't always a solution to every problem, and it makes for a very awkward way around a silly limitation in min() and max(). Their API is already awkward -- because they already take a keyword argument as well as *args or an iterable -- and this does not make it worse in any way. It's trivial to add this, it's trivial to explain -- return a specific value instead of raising a particular exception -- and it's wasteful, complex, fragile or unreadable (except if you have itertools on the mind, I guess) to do the same thing in another way.

    ncoghlan commented 11 years ago

    +1 for adding this. It's simple to implement, simple to explain and the alternatives for dealing with the empty iterable case (or even the fact it may need to be handled at all) are definitely not obvious.

    The relationship to next() is straightforward: the supplied value is effectively used as the default value for the first next call when iterating and then ignored thereafter.

    nedbat commented 11 years ago

    I find the workarounds mentioned here to be baroque and confusing. The concept of a default value to return in the case of an empty iterator is straightforward. I'm +1 on adding this as well.

    dhellmann commented 11 years ago

    +1 on adding this

    I found today via @dabeaz's cookbook that iter() has a sentinel-detection use case. Having one in min/max seems *far* more obviously useful. It's also consistent with quite a few methods on builtin types where we provide a way to deal with unknown data safely by having a default instead of catching exceptions directly.

    ncoghlan commented 11 years ago

    As stated, I don't agree with the closure of this one. min/max deserve a more elegant mechanism for dealing with the empty iterable edge case.

    0d6a79a7-4e74-41c2-8eed-8d04ea475e57 commented 11 years ago

    I could have used this feature myself somewhat recently. It was in some code involving document matching where zero or more possible candidates were assigned a score and I was trying to find the max score. The fact that an empty list was a possibility complicated everything because I had to add extra checks for it. max(scores, default=0) would have been a lot simpler.

    rhettinger commented 11 years ago

    I still think complicating the API isn't worth it. Of late, we've gotten in the habit of a complexity to even the simplest of things.

    In the case of sequences, we already have a reasonable solution:

        low = min(seq) if seq else default

    In the rarer case of non-sequence iterables, catching a Value error is the traditional solution -- not pretty, but not difficult either.

    In the past, Guido has rejected that notion of "add a default value to every function than can raise an exception". For example, someone wanted to add a get() method to lists so they could avoid catching an IndexError, something like s.get(index, default). IIRC, his motivation was that "avoiding API clutter" was more important than supporting an uncommon use case that already had viable solutions using plain Python.

    My own principle is that it is okay to add an initial extra feature to function, but not multiple extra features. This becomes more important when the function already has API issues.

    The min() / max() functions started-out with an API problem because of their dual signature: min(s) versus min(a, b, c). That creates an ambiguity in that case of min(*t) where the result depends on the length of the input. IMO, that issue would be exacerbated by the addition of a default argument (i.e. it would tend to hide the error from test cases):

        for t in [(10, 20, 30), (10, 20), (10,), ()]:
            print min(*t, default=100)

    Also, I don't think we should forget the lessons learned about adding unnecessary arguments to functions. For example, we let someone add start and end arguments to str.startswith() and str.endswith() because "it seemed more parallel with str.find()" and because "someone wanted it once a piece of code somewhere". The downside only became apparent later when there was a legitimate real use case for another feature request: searching multiple prefixes and suffixes. Because we had wasted the positional arguments, we ended-up with the awkward str.startswith(str-or-tuple [,start [,end]]) signature. That is why we have to write, filename.endswith(('.py', '.pyc')). The double parens are part of the cost of API bloat.

    Lastly, when it comes to common-place functions like min() and max(), we can assess needs easily by looking at what other languages have done. Looking at Excel, SQL, Java, Haskell, C# etc, I don't see a precedent for this feature request.

    Grepping for min/max in my own code and third-party libraries, I don't see any cases where the code had a need for this feature. If the need arises, it certainly doesn't come of very often.

    If you guys all think this is an good feature request, then by all means, go ahead and add it. My recommendation is to show design restraint and refuse the temptation to grow this already awkward API when you don't have to.

    8a0c69e7-f5ea-4ea3-8880-4903ed9740d7 commented 11 years ago

    Raymond, I respect that in your opinion this seems to be overcomplexity, but you haven't addressed any of the arguments made, nor responded to any of the arguments against this being added complexity.

    I really don't understand the parallels you're making to str.*with, but as for other languages, as David pointed out already, you are looking at things in a vacuum. This is needed because min and max are already silly. In languages like Ruby and Clojure, which were the quickest I had handy, of course you don't need this, because calling min and max *by default* returns None. I'd bet Python 2's silly type comparison history had something to do with the return value not defaulting to None, but what's done is done. I don't like hard-fast rules, but I don't think APIs should ever be penalized for their own mistakes. We should make sane things possible in pleasant ways.

    If it's OK then (turning back to the patch), unless anyone has something additional to add I'm going to carve up some tests.

    ncoghlan commented 11 years ago

    To me, the Python-specific difference that makes this useful for us but not for others is *precisely* the fact that the simple idiom:

        x = min(seq) if seq else default

    is broken for iterators that don't provide __len or __bool, while the even simpler:

        x = min(seq)

    is broken for the empty iterable.

    However, I think we should explicitly disallow the combination of multiple positional arguments *and the new default argument. If you don't know the length of the input iterable, you should *not be using the multiple argument form.

    8a0c69e7-f5ea-4ea3-8880-4903ed9740d7 commented 11 years ago

    Personally I don't care either way, I basically never use the multiple positional arg form, but what are we trying to prevent exactly? It's bad code, but it (would) do what the person was expecting. Am I not getting the point that's being made about that case?

    rhettinger commented 11 years ago

    Guido, this is your language. What would you like to do?

    The OP wants a default argument on min() and max() so he won't have to use an "except ValueError" for non-sequence iterables that are potentially empty.

    At first, I thought the functions were already as complex as we would want to get, but several proponents have emerged, so I'm stepping aside.

    The proposed patch would allow: max(iterable, key=somefunc, default=sentinel) and would return sentinel_value when len(list(iterable))==0.

    It would not allow: max(*iterable, key=somefunc, default=sentinel_value) where s is an empty iterable.

    gvanrossum commented 11 years ago

    +1

    ncoghlan commented 11 years ago

    Just one final design philosophy comment from me (I know it isn't needed since Guido already ack'ed the suggestion):

    As far as the lessons learned from the historical startswith() case go, avoiding taking up the positional slots with optional flags and configuration parameters is one of the big reasons we added keyword only arguments (with the other being readability at the call site).

    I agree we need to be cautious with API complexity, I just think in this case clean handling of empty iterators is a net win (so long as we rule out the conceptually broken case of combining the new parameter with multiple positional arguments).

    rhettinger commented 11 years ago

    Julian, when your tests are ready, I'll be happy to review and apply the patch.

    0d6a79a7-4e74-41c2-8eed-8d04ea475e57 commented 11 years ago

    To me, the fact that m = max(s) if s else default doesn't work with iterators alone makes this worthy of consideration.

    I would also note that min/max are the only reduction functions that don't have the ability to work with a possibly empty sequence. For example:

        >>> sum([])
        0
        >>> any([])
        False
        >>> all([])
        True
        >>> functools.reduce(lambda x,y: x+y, [], 0)
        0
        >>> math.fsum([])
        0.0
        >>>
    8a0c69e7-f5ea-4ea3-8880-4903ed9740d7 commented 11 years ago

    Thanks for the offer Raymond.

    Here's a patch with some tests and doc updates, incorporating Nick's suggestion (but without a note in the docs, I didn't feel it's notable).

    Please let me know if anything needs tidying.

    8a0c69e7-f5ea-4ea3-8880-4903ed9740d7 commented 11 years ago

    New patchset addressing review. Not sure why upload.py isn't working for me (I assume it should?)

    bitdancer commented 11 years ago

    I find it interesting that I just came across a case where I want this functionality, involving a generator derived from a possibly-empty sql table. I'm using Stefan's functional expression for now :)

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 11 years ago

    New changeset 76196691b5d0 by Raymond Hettinger in branch 'default': bpo-18111: Add a default argument to min() and max() http://hg.python.org/cpython/rev/76196691b5d0

    bitdancer commented 11 years ago

    Julian, could you please submit a contributor agreement? (http://www.python.org/psf/contrib)

    1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 11 years ago

    New changeset 34ff27b431d0 by R David Murray in branch 'default': bpo-18111: Add What's New entry for max/min default. http://hg.python.org/cpython/rev/34ff27b431d0

    merwok commented 10 years ago

    Note that the docs were changed, but versionchanged directives are missing.