Closed f24ef518-0963-4e81-90b1-85bb1f5e2b60 closed 15 years ago
This is a patch for the Python 3.1 build checked out from http://svn.python.org/projects/python/branches/py3k
The current behavior of itertools.combinations(iterable,r) and itertools.permutations(iterable,r) is to throw a ValueError if iterable yields fewer than r items. This changes these functions so the generator instead yields 0 items.
As for my argument for acceptance, while the original behavior is not a bug insofar as its implementation was deliberate, I'd argue this is undesirable on grounds of mathematical consistency and functionality.
In mathematics the "choose" function is defined as "(n choose r)=0" for r>n, so itertools.combinations' behavior is inconsistent with its obvious combinatorial analogy. (Similar for permutations and the combinatorial "P" function.)
For functionality I'll cite my own case as anecdote. In writing regression tests I wanted to ensure that a group of items obeyed a certain "mutually consistent" property between all triples. (Something akin to the triangle inequality.) So:
all(triangle_respected(*triple) for triple in itertools.combinations(group, 3))
If len(group)\<=2, that's fine, since with no triangles there is no inconsistency, and all(())==True. However, the code failed with a ValueError when groups were that small. Working around this was fairly awkward, since I wanted to continue to fail noisily if my triangle_respected function threw a ValueError, and I wasn't sure at all that it wouldn't if my items were
The patch modifies Modules/itertoolsmodule.c slightly, changing combinationsobject_new and permutationsobject_new. (Deleting the error throwing code, and have one line changes in both structures to handle the n>r case. One could handle this special case more efficiently than I do by not bothering to allocate certain structures in this case, but I didn't want to tempt fate.) The patch also changes the appropriate tests in Lib/test/test_itertools.py .
Obviously, this would break code that depends upon Python throwing ValueError in the event of the combination or permutation sequence being empty. However, I hope given that combinations and permutations are a relative novelty that their behavior in this corner case is not yet etched in stone.
Sorry if this belongs in a PEP -- it seems quite minor, but I don't quite have a feel what the threshold is.
I agree that the proposed behaviour seems more correct: the collection of all subsets of size 4 of range(3) is perfectly valid and well-defined; it just happens to be empty. I've also encountered this in practice (in code that was enumerating partitions of a list into a fixed number of pieces), but in my case it was easier to work around.
I guess the counterargument is that the current ValueError might catch bugs early; I don't know how often this is true in practice.
In any case, this is Raymond's call.
Will spend a while mulling this over and taking it under advisement.
Initially, I am disinclined for several reasons.
For many use cases, r>n is an error condition that should not pass silently.
While it's possible to make definitions of comb/perm that define the r>n case as having an empty result, many definitions do not. See the wikipedia article on permutations:
""" In general the number of permutations is denoted by P(n, r), nPr, or sometimes P_n^r, where:
* n is the number of elements available for selection, and
* r is the number of elements to be selected (0 ≤ r ≤ n).
For the case where r = n it has just been shown that P(n, r) = n!. The general case is given by the formula:
P(n, r) = n! / (n-r)!.
"""
That discussion is typical and I think it important the number of items returned matches the formula (which fails when r>n because (n-r)! would call for a negative factorial).
Also see the wikipedia article on combinations (the "choose" function) which also expresses a factorial formula that fails for r>n. No mention is made for special handling for r>n.
For cases where you need a more inclusive definition that assigns a zero length result to cases where r>n, the workaround is easy. Moreover, in such cases, there is some advantage to being explicit about those cases being handled. In the example provided by the OP, the explicit workaround is:
all(triangle_respected(*triple) for triple in itertools.combinations(group, 3) if len(group)>=3)
or if you factor-out the unvarying constant expression in the inner-loop:
len(group)>=3 or all(triangle_respected(*triple) for triple in itertools.combinations(group, 3))
For other cases, it may be preferable to write your own wrapper:
def myperm(pool, r=None):
'custom version of perm returning an empty iterator when r>n'
if r is not None and r > len(pool):
return iter([])
return itertools.permutations(pool, r)
I like this because it is explicit about its intended behavior but doesn't slow-down the common case where some work actually gets done.
Don't want to break any existing code that relies on the ValueError being thrown. It's too late to change this for 2.6 and 3.0 and possibly for 2.7 and 3.1 without having a deprecation period. While this hasn't been out long, it will have been once 2.7 is released. Code that relies on the ValueError may be hard to spot because it is implicitly relying on the ValueError aborting the program for invalid input (invalid in the sense of how the result is being used).
I personally find the r>n case to be weird and would have a hard time explaining it to statistics students who are counting permutations and relating the result back to the factorial formulas.
On the flip side, I do like Mark's thought that the r>n case being empty doesn't muck-up the notion of combinations as returning all "r-length subsequences of elements from the input iterable." Also, I can see a parallel in the string processing world where substring searches using __contains__ simply return False instead of raising an exception when the substring is longer that the string being searched.
Those are my initial thoughts. Will ponder it a bit more this week.
Quick notes so I don't forget:
The two main pure python equivalents in the docs would need to be updated with the new special cases (currently they raise IndexError for r>n). The other two pure python equivalents would automatically handle the proposed new cases.
The Mathematica definition of permutations does not discuss or allow for the r>n case, http://mathworld.wolfram.com/Permutation.html . One of the other tools that actually lists permutations and takes a second argument (most do not) is in Mathematica, http://reference.wolfram.com/mathematica/ref/Permutations.html . It is unclear from their docs whether the r>n case is supported as a zero-length output or as an error condition.
Hi Raymond, thanks for your well reasoned and thorough reply. Just a couple small thoughts... I'll borrow your numbers if you don't mind.
The closest I can think of right now might be key/index errors on dicts and sequences... but that's a user explicitly asking for exactly one element, in which case non-existence would be bad.
In a larger sense, if a result computed by a function is sensible and well defined in itself, it doesn't seem even possible for a function to anticipate what answers might be considered error conditions by calling code. Potentially any return value for any function might be an error for some user code. So... the question then becomes, is the empty result a sensible and well defined answer. And here we differ again I see. :D
OK on perms. If you don't mind, I'm going to talk about combinations since that was my real motivation.
Also see the wikipedia article on combinations (the "choose" function) which also expresses a factorial formula that fails for r>n. No mention is made for special handling for r>n.
Hmmm. Are you looking at the definition of "choose function" under Wikipedia? http://en.wikipedia.org/wiki/Choose_function Notice the material after the "and," after the first part of the definition, in case you missed it in your first reading. So, this is the def I was taught through high school and undergrad, perhaps the same for Mark judging from his reply. You were taught it was undefined? It does sort of complicate some other defs that rely on binomial coefficients though, I'd think, but I guess you could make those work with little effort. It's not as though subtly different but incompatible definitions of basic terms are that uncommon in math... never knew the binomial coef was in that category though. You teach stats I take it? I suppose you'd know then... never heard of it, but that doesn't mean much. Oh well. (You can perhaps feel my pain at seeing the current behavior, given my understanding.)
Saying the fix is more explicit suggests there's something implicit about the concise solution. If the choose function is defined for those values, then itertools.combinations is just unnecessarily forcing code to treat something as a special case even when it's not. So, I'd be careful to draw a distinction between "more verbose" and "more explicit."
A third and extraneous comment, if you do wind up changing the docs anyway, perhaps a shorter recursive solution would be easier to understand? Maybe it's just me since I'm sort of a functional type of guy, but I found the equivalent code a little hard to parse.
This is the most important issue. Is the r>n case typically an error in the user's program or thought process. If so, then the utility of a ValueError can be weighed against the utility of other cases where you want combs/perms to automatically handle the r>n cases with an empty output. These use cases are what I'm mulling over this week.
I see your wikipedia link. The one I was looking at was: http://en.wikipedia.org/wiki/Combination
I wasn't clear on the issue of explictness. The potential problem that it isn't immediately obvious to me what comb/perm does when r>n, so if I see code that isn't explictly handling that case, I have to lookup what comb/perm does do to make sure the code works.
In the math module, it is a virtue that math.sqrt(-1) raises a ValueError. In the cmath module, it is a virtue that it does not. The latter case is distinguished because the programmer has explicitly requested a routine that can handle complex numbers -- that is a good clue that the surrounding code was designed to handle the complex result.
Not too worried about this one. Essentially the thought is that code that wasn't designed with the r>n case in mind probably benefits from having a ValueError raised when that condition is encountered.
This one bugs me a bit. It is nice to have all the factorial
formulas just work and not have a piecewise definition.
BTW, we don't really have a difference of opinion. My mind is open. I aspire to pick the option that is the best for most users including students and novice users. The trick in language design is to anticipate use cases and to understand that people have differing world views (i.e. it is a perfectly reasonable point-of-view that taking n things r at a time makes no sense when r is greater than n).
In choosing, there is some bias toward sticking the API as it was released. Changing horses in midstream always causes a headache for some user somewhere.
Am still thinking this one through and will let you know in a week or so. I also want to continue to research into how this is handled in other libraries and other languages.
Another thought before I forget: The piecewise definition of the choose function or for binomial coefficients suggests that supporting the r>n case should be accompanied by supporting the r\<0 case.
- This one bugs me a bit. It is nice to have all the factorial formulas just work and not have a piecewise definition.
IIRC, in the 'Concrete Mathematics' book, Knuth and co use something like
(n choose k) = (n-to-the-k-falling)/k!
to get around this: this definition works for all k >= 0 and all integers (or even real numbers) n. Here x-to-the-k-falling means x (x-1) ... * (x-k+1). See:
http://mathworld.wolfram.com/FallingFactorial.html
Another thought before I forget: The piecewise definition of the choose function or for binomial coefficients suggests that supporting the r>n case should be accompanied by supporting the r\<0 case.
I'd say not: in the context of sets of combinations/permutations, (rather than just defining the nCr or nPr symbols) r has a definite meaning as a cardinality: "enumerate all subsets of iter of cardinality r" and so it makes sense to me to restrict to the case where r is nonnegative.
(If we were talking about the integer-valued function nCr, then I'd agree.)
My own guess would be that having combinations(range(n), r) give the empty iterator for r > n is the right thing in the vast majority of situations, even (especially?) where the programmer hasn't anticipated the possibility of r > n. I have yet to see a case where it's useful to raise ValueError.
I have access to the Magma algebra package at work; I'll check tomorrow or Friday to see what that does.
Mathematica returns an empty list.
In[1]:= Permutations[{1,2},{1}]
Out[1]= {{1}, {2}}
In[2]:= Permutations[{1,2},{4}]
Out[2]= {}
In[3]:=
David, thanks for the data point. What does it return for
In[1]:= Permutations[{a, b, c}, {-1}]
Mathematica indicates for the user to define it later. An error.
In[3]:= Permutations[{1,2},{-2}]
Permutations::nninfseq: Position 2 of Permutations[{1, 2}, {-2}] must be All, Infinity, a non-negative integer, or a List whose first element (required) is a non-negative integer, second element (optional) is a non-negative integer or Infinity, and third element (optional) is a nonzero integer.
Out[4]= Permutations[{1, 2}, {-2}]
Results from Magma, Sage and Gap:
Magma provides functions "Subsets(S, k)" and "Permutations(S, k)". From the documentation:
Subsets(S, k) : The set of subsets of the set S of size k. If k is larger than the cardinality of S then the result will be empty.
Permutations(S, k) : The set of permutations (stored as sequences) of each of the subsets of the set S of cardinality k.
GAP has the same behaviour: even if you've never encountered GAP before, it's fairly Pythonesque, so the following extract from a GAP 4 REPL means exactly what you think it does:
gap> Combinations([1,2,3,4], 3); [ [ 1, 2, 3 ], [ 1, 2, 4 ], [ 1, 3, 4 ], [ 2, 3, 4 ] ] gap> Combinations([1,2,3,4], 5); [ ]
Permutations work the same way (but the function is called Arrangements):
gap> Arrangements([1,2,3,4], 3); [ [ 1, 2, 3 ], [ 1, 2, 4 ], [ 1, 3, 2 ], [ 1, 3, 4 ], [ 1, 4, 2 ], [ 1, 4, 3 ], [ 2, 1, 3 ], [ 2, 1, 4 ], [ 2, 3, 1 ], [ 2, 3, 4 ], [ 2, 4, 1 ], [ 2, 4, 3 ], [ 3, 1, 2 ], [ 3, 1, 4 ], [ 3, 2, 1 ], [ 3, 2, 4 ], [ 3, 4, 1 ], [ 3, 4, 2 ], [ 4, 1, 2 ], [ 4, 1, 3 ], [ 4, 2, 1 ], [ 4, 2, 3 ], [ 4, 3, 1 ], [ 4, 3, 2 ] ] gap> Arrangements([1,2,3,4], 5); [ ]
Combinations([1,2,3,4], -1) gives an error. Interestingly, Arrangements([1,2,3,4], -1) returns the empty list.
GAP also has a NrCombinations function returning the number of combinations:
gap> NrCombinations([1,2,3,4], 5); 0 gap> NrCombinations([1,2,3,4], -1); 0
My Sage installation currently seems to be broken, but from the documentation it looks as though it just steals the functions from GAP.
Got Sage working again. It also returns an empty list for r > n. For r negative, Combinations returns an empty list and Permutation gives an error.
sage: Combinations(range(4), 6) Combinations of [0, 1, 2, 3] of length 6 sage: Combinations(range(4), 6).list() [] sage: Permutations(range(4), 6).list() [] sage: Combinations(range(4), -1).list() [] sage: Permutations(range(4), -1).list() ---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
[... traceback snipped ...]
Data point: Microsoft Excel's PERMUT() and COMBIN() return #NUM! for r>n or r\<0.
I try to "not know" excel. Does it have any other means to represent an empty set?
Excel is returning the count, not the set itself. So, it could have chosen to return zero.
The HP32SII also has nCr and nPr functions that return INVALID DATA for r>n or r\<0.
The Combinatorics section of CRC Standard Mathematical Formulae (30th edition) defines nCr and nPr only for 0 \<= r \<= n. The same is true in Harris and Stocker's Handbook of Mathematics and Computational science.
Summary of functions/definitions expected to return a number:
r>n r\<0 Source
--------- ---------- ------------------------------
error error MS Excel PERMUT() and COMBIN()
error error HP32SII nPr and nCr
undefined undefined CRC Combinatoric definitions
undefined undefined Handbook of Mathematics and Computational Science
undefined undefined Wolfram:
http://mathworld.wolfram.com/Permutation.html
zero zero http://en.wikipedia.org/wiki/Choose_function
undefined undefined http://en.wikipedia.org/wiki/Combination
undefined undefined http://en.wikipedia.org/wiki/Permutation
zero undefined Knuth's choose function in Concrete Mathematics
zero zero GAP's nrCombinations()
Summary for tools that return the individual subsets:
r>n r\<0 Source --------- ---------- ------------------------------ emptylist error Mathematica emptylist unknown Magma emptylist error GAP Combinations emptylist emptylist GAP Arrangements emptylist emptylist Sage Combinations emptylist error Sage Permutations
David, I know the OPs position and Mark's opinion. What is your recommendation for the r>n case and the r\<0 case?
Attached an updated patch which expands the tests and docs. Handles the r>n case with an empty result. Leaves the r\<0 case as a ValueError.
Am thinking that if we do this, it should be treated as a bug and posted to 2.6 and 3.0.
I had thought highly of the "mull it over for a week" plan. After a week we'd decide to follow Stephen Wolfram's lead, which seems to be the current patch. I haven't yet used the python permutations iterator, although I used to have a script that solved word JUMBLE puzzles with a mathematica | spell pipeline. Now I look up words using a sorted anagram dictionary.
Thanks David. Looks like we're all in agreement. Turns out, I did a weeks worth of mulling today ;-) Accepting the proposal with the following rationale:
Weighing against it:
Fixed in r68394 Will forward port in a day or two.
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/rhettinger' closed_at =
created_at =
labels = ['tests', 'type-bug', 'library']
title = 'Patch of itertools.{combinations,permutations} for empty combinations'
updated_at =
user = 'https://bugs.python.org/TFinley'
```
bugs.python.org fields:
```python
activity =
actor = 'rhettinger'
assignee = 'rhettinger'
closed = True
closed_date =
closer = 'rhettinger'
components = ['Library (Lib)', 'Tests']
creation =
creator = 'TFinley'
dependencies = []
files = ['12563', '12647']
hgrepos = []
issue_num = 4816
keywords = ['patch']
message_count = 22.0
messages = ['78943', '78948', '79111', '79116', '79308', '79366', '79367', '79369', '79370', '79371', '79372', '79373', '79374', '79379', '79380', '79381', '79382', '79383', '79386', '79393', '79394', '79399']
nosy_count = 4.0
nosy_names = ['rhettinger', 'mark.dickinson', 'LambertDW', 'TFinley']
pr_nums = []
priority = 'normal'
resolution = 'fixed'
stage = None
status = 'closed'
superseder = None
type = 'behavior'
url = 'https://bugs.python.org/issue4816'
versions = ['Python 2.6', 'Python 3.0', 'Python 3.1', 'Python 2.7']
```