python / cpython

The Python programming language
https://www.python.org
Other
63.2k stars 30.27k forks source link

Implement partial order on Counter #66705

Closed 06547701-06a2-4e4a-b672-16a33475101e closed 10 years ago

06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago
BPO 22515
Nosy @rhettinger, @mdickinson, @pitrou, @scoder, @ericvsmith, @stevendaprano, @bitdancer, @cool-RR, @asmeurer, @ethanfurman, @serhiy-storchaka, @MojoVampire
Files
  • patch.patch
  • patch2.patch
  • issue22515.stoneleaf.patch.01
  • issue22515_partial_order_counter_v2.diff
  • issue22515_partial_order_counter_v3.diff
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/rhettinger' closed_at = created_at = labels = ['type-feature', 'library'] title = 'Implement partial order on Counter' updated_at = user = 'https://github.com/cool-RR' ``` bugs.python.org fields: ```python activity = actor = 'Aaron.Meurer' assignee = 'rhettinger' closed = True closed_date = closer = 'rhettinger' components = ['Library (Lib)'] creation = creator = 'cool-RR' dependencies = [] files = ['36790', '36792', '36794', '36796', '36801'] hgrepos = [] issue_num = 22515 keywords = ['patch'] message_count = 66.0 messages = ['227819', '227824', '227827', '227828', '227829', '227831', '227842', '227844', '227845', '227846', '227847', '227848', '227849', '227850', '227851', '227852', '227857', '227858', '227860', '228071', '228072', '228077', '228082', '228083', '228084', '228085', '228086', '228087', '228088', '228089', '228090', '228094', '228096', '228097', '228098', '228099', '228100', '228101', '228102', '228105', '228236', '228312', '228374', '228391', '228398', '228401', '228402', '228403', '228411', '228429', '228433', '228436', '228440', '228441', '228442', '228443', '228444', '228445', '228461', '228462', '228463', '228477', '228487', '229065', '231365', '253251'] nosy_count = 13.0 nosy_names = ['rhettinger', 'mark.dickinson', 'pitrou', 'scoder', 'eric.smith', 'steven.daprano', 'r.david.murray', 'cool-RR', 'Aaron.Meurer', 'ethan.furman', 'serhiy.storchaka', 'Saimadhav.Heblikar', 'josh.r'] pr_nums = [] priority = 'normal' resolution = 'rejected' stage = 'patch review' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue22515' versions = ['Python 3.5'] ```

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    I suggest implementing Counter.__lt__ which will be a partial order, similarly to set.__lt__. That is, one counter will be considered smaller-or-equal to another if for any item in the first counter, the second counter has an equal or bigger amount of that item.

    stevendaprano commented 10 years ago

    On Mon, Sep 29, 2014 at 07:33:21PM +0000, Ram Rachum wrote:

    I suggest implementing Counter.__lt__ which will be a partial order, similarly to set.__lt__.

    Since Counter is described as a multiset, this sounds reasonable to me.

    serhiy-storchaka commented 10 years ago

    What should be result of following operations?

    Counter({'a': 0}) \< Counter({}) Counter({}) \< Counter({'a': 0}) Counter({'a': -1}) \< Counter({}) Counter({}) \< Counter({'a': -1})

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    I suggest they be ignored like in elements.

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    (I mean, the non-positive values should be ignored.)

    serhiy-storchaka commented 10 years ago

    There is other definition of the \<= operator for sets: "A \<= B" is equivalent to "len(A - B) == 0". Extending to Counter this can mean "len(A.subtract(B).elements()) == 0".

    ethanfurman commented 10 years ago

    What would be the result of

    Counter({'a':1, 'b':2}) \< Counter({'a':2, 'b':1})

    ?

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    False, like with sets.

    bitdancer commented 10 years ago

    Thanks, Ethan. I had a feeling that this wasn't well defined but I couldn't come up with an example :)

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    David, there's nothing here that isn't well defined. It's simply a partial order, not a total order. We have the same for sets.

    ethanfurman commented 10 years ago

    set's don't have values, and you are wanting to implement the partial ordering based on the values. (side-note: how does partial-ordering work for sets?)

    That is, one counter will be considered smaller-or-equal to another if for any item in the first counter, the second counter has an equal or bigger amount of that item.

    According to your definition, my example should have returned True, which is clearly nonsensical.

    Even if you changed the definition to:

    For every item in the first counter, that item's value is less than the corresponding item in the second counter.

    You have situations like:

    Counter({'a':1, 'b':1}) \< Counter({'a':2})

    I just don't think there is one interpretation that is going to be correct most of the time.

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    Ethan, I don't understand what the problem is. I also don't understand your side note question "how does partial-ordering work for sets?" I'm not sure what you're asking.

    > That is, one counter will be considered smaller-or-equal to another if for any > item in the first counter, the second counter has an equal or bigger amount of > that item.

    According to your definition, my example should have returned True, which is clearly nonsensical.

    In your example the first counter had b twice, while the second counter had it only once. So the result is False. This comes pretty directly from my definition, I'm not sure where the confusion is.

    Regarding your new example: Since Counter works like a defaultdict, the second counter has a zero quantity of b. So the answer is again False.

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    To put it another way: In Python sets, a <= b iff b has everything that a has, and possibly more. I'm proposing the exact same definition for counters. True, the implementation might be different because sets are not dicts and don't have .values(), but that is just the implementation, which is not the important thing here.

    ethanfurman commented 10 years ago

    That is certainly a better definition. If you carefully reread your original post you will see that you wrote something different ('any' instead of 'every'). Also, your OP uses '__lt__' but talks about 'less-than-or-equal -- somewhat confusing. ;)

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    You're right, sorry. I meant the mathematical "for any" which means "for every": https://en.wikipedia.org/wiki/List_of_mathematical_symbols (See "for any;")

    pitrou commented 10 years ago

    The request sounds ok to me. As in the corner case of negative values, it seems that Counter currently doesn't elide zero values, e.g.:

    >>> Counter({'a': 0}) == Counter({})
    False

    Therefore, we should have:

    >>> Counter({'a': -1}) < Counter({'a': 0})
    True

    but:

    >>> Counter({'a': -1}) < Counter({})
    False

    (I must admit, the current semantics of Counter look a bit ad hoc and not terribly consistent... halfway between a regular mapping and a multiset)

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    Shall I write a patch?

    ethanfurman commented 10 years ago

    I think the final call is Raymond's (whether to accept the patch), but go ahead. Worst case scenario is you'll have your own thoroughly tested PartiallyOrderedCounter you can use in your own code. :)

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    If that's the case I'd prefer Raymond to first say whether this feature is generally welcome before spending my time on making a patch.

    ericvsmith commented 10 years ago

    Is there some particular problem you're trying to solve, which this would make easier?

    Without a use case, I'm -1.

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    I needed it for an internal calculation in a combinatorics package I'm writing. Do you want me to take the time to explain the calculation? It's quite complex.

    ericvsmith commented 10 years ago

    No need to explain it. It sounds like it's not generally useful, so I'm still -1.

    pitrou commented 10 years ago

    Even if you don't find it useful, Eric, it doesn't take up the method space. You can very easily ignore it, there's no cognitive burden.

    ethanfurman commented 10 years ago

    If it is so specialized as to only be needed in complex combinatorial calculations, does it belong in the general-purpose part of the language? After all, we have the math and cmath modules for more specialized arithmetic operations.

    ethanfurman commented 10 years ago

    Curiousity question: What happens if you try to sort a list of partially ordered Counters?

    pitrou commented 10 years ago

    If it is so specialized as to only be needed in complex combinatorial calculations

    How do you know it is "only needed in complex combinatorial calculations"?

    What happens if you try to sort a list of partially ordered Counters?

    Try it with partially ordered sets.

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    I don't see why it's so hard to imagine how this will be used. Say I have a counter signifying how many of each product I have in my warehouse, and I have another counter saying how many of each product a client wants. I may use counter2 <= counter1 to check whether there are enough products in stock. Sounds straightforward to me.

    ethanfurman commented 10 years ago

    --> s1 = set([1]) --> s2 = set([1, 2]) --> s3 = set([1, 2, 3]) --> s4 = set([2]) --> s5 = set([2, 3]) --> s6 = set([3])

    --> l = [s1, s2, s3, s4, s5, s6] --> sorted(l) [{1}, {2}, {1, 2}, {3}, {2, 3}, {1, 2, 3}]

    --> s1 \< s4 False

    --> s4 \< s2 True

    --> s1 \< s2 True

    --> s4 \< s6 False

    --> l = [s1, s4] --> sorted(l) [{1}, {2}]

    --> sorted(l) [{1}, {2}]

    Looks horribly messy to me. In the last example we can see that neither s1 nor s4 are smaller, yet s1 is consistently put first.

    On the other hand, I suppose it's okay for Counter to have this behavior since it's already in sets.

    stevendaprano commented 10 years ago

    Ethan said:

    If it is so specialized as to only be needed in complex combinatorial calculations, does it belong in the general-purpose part of the language?

    It's a multi-set, a general purpose and fairly fundamental data type.

    https://en.wikipedia.org/wiki/Set_%28abstract_data_type%29#Multiset

    And later:

    Curiousity question: What happens if you try to sort a list of partially ordered Counters?

    The same thing that happens when you sort a list of any partially ordered objects, such as sets:

    py> sorted([{1, 2, 3, 4}, {2, 4}, {1, 3}, {2, 3, 4}, {1, 2, 3}]) [{2, 4}, {1, 3}, {2, 3, 4}, {1, 2, 3}, {1, 2, 3, 4}]

    You get some order, but since sorting assumes a total order, not just partial order, the result isn't really meaningful, and will very likely depend on the initial order of the items. If that worries you, then don't sort items that implement only a partial order.

    ethanfurman commented 10 years ago

    I'll go with +0.5. :)

    If this goes in, I think a missing key in either Counter should default to 0 for purposes of the ordering.

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    If/when there's general agreement that this functionality should be merged in (assuming the implementation is acceptable), let me know and I'll be happy to write the code and tests.

    pitrou commented 10 years ago

    I'll go with +0.5. :)

    I was going to make a joke about Counters only accepting integral values, but the constructor is actually quite laxist:

    >>> Counter({'a': 2.5})
    Counter({'a': 2.5})
    >>> Counter({'a': 2.5 + 1j})
    Counter({'a': (2.5+1j)})
    >>> Counter({'a': 'b'})
    Counter({'a': 'b'})

    (!)

    ethanfurman commented 10 years ago

    That does seem odd -- how can you have 'b' of something?

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    The real problem is that Counter was put into the standard library with such lenient conditions on how it should be used-- basically looking at it as a dict subclass rather than a counter, putting emphasis on implementation rather than purpose, which was a mistake that we'll now have to deal with forever.

    scoder commented 10 years ago

    That does seem odd -- how can you have 'b' of something?

    Don't ask this kind of question as long as any math guys are listening.

    But seriously, I consider this proposal reasonable. If the problem is that the counter values can be non-integers, then why not just raise an exception if someone tries to test the ordering with illegal (i.e. unordered or uncomparable) values? Data quality issues are best handled on user side.

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    Stefan: If you'd want to raise an exception, you'll need to define in which cases. Should it raise an exception for decimal numbers? Complex numbers? etc.

    Personally I'd enjoy it raising exceptions in as many situations as possible (so we could keep Counter usage focused to actual counter usage and not just wild dicts) but I doubt it'll sit well with people who might use Counters with weird values.

    ethanfurman commented 10 years ago

    I have to disagree. The intent is clearly expressed in the docs [1]. However, if I have a need to deal with partial amounts (say, 2.5 apples because I gave half of one to my horse ;), Counter will still work with that:

    --> treats = Counter({'carrots':12, 'apples':3, 'sugar_cubes':100}) --> treats Counter({'sugar_cubes': 100, 'carrots': 12, 'apples': 3}) --> treats['apples'] -= 0.5 --> treats Counter({'sugar_cubes': 100, 'carrots': 12, 'apples': 2.5})

    At least, it will until we fix that bug. ;)

    scoder commented 10 years ago

    I'd raise an exception whenever the values of the Counter turn out to be uncomparable. If users manage to fill Counters with non-number values that are comparable amongst each other, or even if each key has a value of a different type, why not just support that?

    All that is really required for this is that you can compare a) keys for equality and b) values of identical keys for ordering, right?

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    Ah, now I understand you Stefan. That sounds like a good scheme; let any comparison errors between the values simply propagate up.

    ethanfurman commented 10 years ago

    [1] https://docs.python.org/3/library/collections.html#collections.Counter

    ethanfurman commented 10 years ago

    I, myself, wrote: -----------------

    At least, it will until we fix that bug. ;)

    Happily, reading the whole documentation revealed that non-integer values are allowed, so it's not a bug. :)

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    Attached patch of implementation with tests. I haven't actually run this (because I don't know how to run Python's test suite) so there are likely to be bugs. if there's general agreement about this direction I'll figure out how to run the test so I could ensure it works.

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    Fixed patch attached. Still needs someone to run it and see whether there are any bugs.

    ethanfurman commented 10 years ago

    Testing the patch...

    ethanfurman commented 10 years ago

    In going through the tests I have found an edge-case that I am unsure of how to handle:

    Counter(a=1) \< Counter(a=1, b=1)

    On the one hand, they both have the same value for 'a'; on the other hand, the second counter has more elements...

    So should the result be False or True?

    pitrou commented 10 years ago

    Counter(a=1) \< Counter(a=1, b=1)

    Do an analogy with sets and the answer will be obvious.

    ethanfurman commented 10 years ago

    I am not a mathematician -- feel free to include the answer with your hints. ;)

    If my analogy is correct, this situation is similar to

    {1} \< {1, 2}

    Which is True. Can someone verify that I am correct?

    06547701-06a2-4e4a-b672-16a33475101e commented 10 years ago

    Yes, correct.

    ethanfurman commented 10 years ago

    New patch attached.

    22200024-de1a-4081-ad85-2ac04e6b54d2 commented 10 years ago

    I mentioned my wording for the comment about lesser than partial ordering using the code review tool.

    In the attached patch(based on bpo-22515.stoneleaf.patch.01), I have added test for the case when the other object is not a mapping.

    I have also tried to refactor the __le and __ge methods.

    Please comment on the changes. Regards