Closed 06547701-06a2-4e4a-b672-16a33475101e closed 10 years ago
I suggest implementing Counter.__lt__
which will be a partial order, similarly to set.__lt__
. That is, one counter will be considered smaller-or-equal to another if for any item in the first counter, the second counter has an equal or bigger amount of that item.
On Mon, Sep 29, 2014 at 07:33:21PM +0000, Ram Rachum wrote:
I suggest implementing
Counter.__lt__
which will be a partial order, similarly toset.__lt__
.
Since Counter is described as a multiset, this sounds reasonable to me.
What should be result of following operations?
Counter({'a': 0}) \< Counter({}) Counter({}) \< Counter({'a': 0}) Counter({'a': -1}) \< Counter({}) Counter({}) \< Counter({'a': -1})
I suggest they be ignored like in elements
.
(I mean, the non-positive values should be ignored.)
There is other definition of the \<= operator for sets: "A \<= B" is equivalent to "len(A - B) == 0". Extending to Counter this can mean "len(A.subtract(B).elements()) == 0".
What would be the result of
Counter({'a':1, 'b':2}) \< Counter({'a':2, 'b':1})
?
False, like with sets.
Thanks, Ethan. I had a feeling that this wasn't well defined but I couldn't come up with an example :)
David, there's nothing here that isn't well defined. It's simply a partial order, not a total order. We have the same for sets.
set's don't have values, and you are wanting to implement the partial ordering based on the values. (side-note: how does partial-ordering work for sets?)
That is, one counter will be considered smaller-or-equal to another if for any item in the first counter, the second counter has an equal or bigger amount of that item.
According to your definition, my example should have returned True, which is clearly nonsensical.
Even if you changed the definition to:
For every item in the first counter, that item's value is less than the corresponding item in the second counter.
You have situations like:
Counter({'a':1, 'b':1}) \< Counter({'a':2})
I just don't think there is one interpretation that is going to be correct most of the time.
Ethan, I don't understand what the problem is. I also don't understand your side note question "how does partial-ordering work for sets?" I'm not sure what you're asking.
> That is, one counter will be considered smaller-or-equal to another if for any > item in the first counter, the second counter has an equal or bigger amount of > that item.
According to your definition, my example should have returned True, which is clearly nonsensical.
In your example the first counter had b
twice, while the second counter had it only once. So the result is False
. This comes pretty directly from my definition, I'm not sure where the confusion is.
Regarding your new example: Since Counter
works like a defaultdict
, the second counter has a zero quantity of b
. So the answer is again False
.
To put it another way: In Python sets, a <= b
iff b
has everything that a
has, and possibly more. I'm proposing the exact same definition for counters. True, the implementation might be different because sets are not dicts and don't have .values()
, but that is just the implementation, which is not the important thing here.
That is certainly a better definition. If you carefully reread your original post you will see that you wrote something different ('any' instead of 'every'). Also, your OP uses '__lt__' but talks about 'less-than-or-equal -- somewhat confusing. ;)
You're right, sorry. I meant the mathematical "for any" which means "for every": https://en.wikipedia.org/wiki/List_of_mathematical_symbols (See "for any;")
The request sounds ok to me. As in the corner case of negative values, it seems that Counter currently doesn't elide zero values, e.g.:
>>> Counter({'a': 0}) == Counter({})
False
Therefore, we should have:
>>> Counter({'a': -1}) < Counter({'a': 0})
True
but:
>>> Counter({'a': -1}) < Counter({})
False
(I must admit, the current semantics of Counter look a bit ad hoc and not terribly consistent... halfway between a regular mapping and a multiset)
Shall I write a patch?
I think the final call is Raymond's (whether to accept the patch), but go ahead. Worst case scenario is you'll have your own thoroughly tested PartiallyOrderedCounter you can use in your own code. :)
If that's the case I'd prefer Raymond to first say whether this feature is generally welcome before spending my time on making a patch.
Is there some particular problem you're trying to solve, which this would make easier?
Without a use case, I'm -1.
I needed it for an internal calculation in a combinatorics package I'm writing. Do you want me to take the time to explain the calculation? It's quite complex.
No need to explain it. It sounds like it's not generally useful, so I'm still -1.
Even if you don't find it useful, Eric, it doesn't take up the method space. You can very easily ignore it, there's no cognitive burden.
If it is so specialized as to only be needed in complex combinatorial calculations, does it belong in the general-purpose part of the language? After all, we have the math and cmath modules for more specialized arithmetic operations.
Curiousity question: What happens if you try to sort a list of partially ordered Counters?
If it is so specialized as to only be needed in complex combinatorial calculations
How do you know it is "only needed in complex combinatorial calculations"?
What happens if you try to sort a list of partially ordered Counters?
Try it with partially ordered sets.
I don't see why it's so hard to imagine how this will be used. Say I have a counter signifying how many of each product I have in my warehouse, and I have another counter saying how many of each product a client wants. I may use counter2 <= counter1
to check whether there are enough products in stock. Sounds straightforward to me.
--> s1 = set([1]) --> s2 = set([1, 2]) --> s3 = set([1, 2, 3]) --> s4 = set([2]) --> s5 = set([2, 3]) --> s6 = set([3])
--> l = [s1, s2, s3, s4, s5, s6] --> sorted(l) [{1}, {2}, {1, 2}, {3}, {2, 3}, {1, 2, 3}]
--> s1 \< s4 False
--> s4 \< s2 True
--> s1 \< s2 True
--> s4 \< s6 False
--> l = [s1, s4] --> sorted(l) [{1}, {2}]
--> sorted(l) [{1}, {2}]
Looks horribly messy to me. In the last example we can see that neither s1 nor s4 are smaller, yet s1 is consistently put first.
On the other hand, I suppose it's okay for Counter to have this behavior since it's already in sets.
Ethan said:
If it is so specialized as to only be needed in complex combinatorial calculations, does it belong in the general-purpose part of the language?
It's a multi-set, a general purpose and fairly fundamental data type.
https://en.wikipedia.org/wiki/Set_%28abstract_data_type%29#Multiset
And later:
Curiousity question: What happens if you try to sort a list of partially ordered Counters?
The same thing that happens when you sort a list of any partially ordered objects, such as sets:
py> sorted([{1, 2, 3, 4}, {2, 4}, {1, 3}, {2, 3, 4}, {1, 2, 3}]) [{2, 4}, {1, 3}, {2, 3, 4}, {1, 2, 3}, {1, 2, 3, 4}]
You get some order, but since sorting assumes a total order, not just partial order, the result isn't really meaningful, and will very likely depend on the initial order of the items. If that worries you, then don't sort items that implement only a partial order.
I'll go with +0.5. :)
If this goes in, I think a missing key in either Counter should default to 0 for purposes of the ordering.
If/when there's general agreement that this functionality should be merged in (assuming the implementation is acceptable), let me know and I'll be happy to write the code and tests.
I'll go with +0.5. :)
I was going to make a joke about Counters only accepting integral values, but the constructor is actually quite laxist:
>>> Counter({'a': 2.5})
Counter({'a': 2.5})
>>> Counter({'a': 2.5 + 1j})
Counter({'a': (2.5+1j)})
>>> Counter({'a': 'b'})
Counter({'a': 'b'})
(!)
That does seem odd -- how can you have 'b' of something?
The real problem is that Counter
was put into the standard library with such lenient conditions on how it should be used-- basically looking at it as a dict
subclass rather than a counter, putting emphasis on implementation rather than purpose, which was a mistake that we'll now have to deal with forever.
That does seem odd -- how can you have 'b' of something?
Don't ask this kind of question as long as any math guys are listening.
But seriously, I consider this proposal reasonable. If the problem is that the counter values can be non-integers, then why not just raise an exception if someone tries to test the ordering with illegal (i.e. unordered or uncomparable) values? Data quality issues are best handled on user side.
Stefan: If you'd want to raise an exception, you'll need to define in which cases. Should it raise an exception for decimal numbers? Complex numbers? etc.
Personally I'd enjoy it raising exceptions in as many situations as possible (so we could keep Counter usage focused to actual counter usage and not just wild dicts) but I doubt it'll sit well with people who might use Counters with weird values.
I have to disagree. The intent is clearly expressed in the docs [1]. However, if I have a need to deal with partial amounts (say, 2.5 apples because I gave half of one to my horse ;), Counter will still work with that:
--> treats = Counter({'carrots':12, 'apples':3, 'sugar_cubes':100}) --> treats Counter({'sugar_cubes': 100, 'carrots': 12, 'apples': 3}) --> treats['apples'] -= 0.5 --> treats Counter({'sugar_cubes': 100, 'carrots': 12, 'apples': 2.5})
At least, it will until we fix that bug. ;)
I'd raise an exception whenever the values of the Counter turn out to be uncomparable. If users manage to fill Counters with non-number values that are comparable amongst each other, or even if each key has a value of a different type, why not just support that?
All that is really required for this is that you can compare a) keys for equality and b) values of identical keys for ordering, right?
Ah, now I understand you Stefan. That sounds like a good scheme; let any comparison errors between the values simply propagate up.
I, myself, wrote: -----------------
At least, it will until we fix that bug. ;)
Happily, reading the whole documentation revealed that non-integer values are allowed, so it's not a bug. :)
Attached patch of implementation with tests. I haven't actually run this (because I don't know how to run Python's test suite) so there are likely to be bugs. if there's general agreement about this direction I'll figure out how to run the test so I could ensure it works.
Fixed patch attached. Still needs someone to run it and see whether there are any bugs.
Testing the patch...
In going through the tests I have found an edge-case that I am unsure of how to handle:
Counter(a=1) \< Counter(a=1, b=1)
On the one hand, they both have the same value for 'a'; on the other hand, the second counter has more elements...
So should the result be False or True?
Counter(a=1) \< Counter(a=1, b=1)
Do an analogy with sets and the answer will be obvious.
I am not a mathematician -- feel free to include the answer with your hints. ;)
If my analogy is correct, this situation is similar to
{1} \< {1, 2}
Which is True. Can someone verify that I am correct?
Yes, correct.
New patch attached.
I mentioned my wording for the comment about lesser than partial ordering using the code review tool.
In the attached patch(based on bpo-22515.stoneleaf.patch.01), I have added test for the case when the other object is not a mapping.
I have also tried to refactor the __le and __ge methods.
Please comment on the changes. Regards
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields: ```python assignee = 'https://github.com/rhettinger' closed_at =
created_at =
labels = ['type-feature', 'library']
title = 'Implement partial order on Counter'
updated_at =
user = 'https://github.com/cool-RR'
```
bugs.python.org fields:
```python
activity =
actor = 'Aaron.Meurer'
assignee = 'rhettinger'
closed = True
closed_date =
closer = 'rhettinger'
components = ['Library (Lib)']
creation =
creator = 'cool-RR'
dependencies = []
files = ['36790', '36792', '36794', '36796', '36801']
hgrepos = []
issue_num = 22515
keywords = ['patch']
message_count = 66.0
messages = ['227819', '227824', '227827', '227828', '227829', '227831', '227842', '227844', '227845', '227846', '227847', '227848', '227849', '227850', '227851', '227852', '227857', '227858', '227860', '228071', '228072', '228077', '228082', '228083', '228084', '228085', '228086', '228087', '228088', '228089', '228090', '228094', '228096', '228097', '228098', '228099', '228100', '228101', '228102', '228105', '228236', '228312', '228374', '228391', '228398', '228401', '228402', '228403', '228411', '228429', '228433', '228436', '228440', '228441', '228442', '228443', '228444', '228445', '228461', '228462', '228463', '228477', '228487', '229065', '231365', '253251']
nosy_count = 13.0
nosy_names = ['rhettinger', 'mark.dickinson', 'pitrou', 'scoder', 'eric.smith', 'steven.daprano', 'r.david.murray', 'cool-RR', 'Aaron.Meurer', 'ethan.furman', 'serhiy.storchaka', 'Saimadhav.Heblikar', 'josh.r']
pr_nums = []
priority = 'normal'
resolution = 'rejected'
stage = 'patch review'
status = 'closed'
superseder = None
type = 'enhancement'
url = 'https://bugs.python.org/issue22515'
versions = ['Python 3.5']
```