python / cpython

The Python programming language
https://www.python.org
Other
63.19k stars 30.26k forks source link

Add key argument to collections.Counter #89501

Closed 4a3ab368-64a6-4762-98c3-8a07ecbd1524 closed 3 years ago

4a3ab368-64a6-4762-98c3-8a07ecbd1524 commented 3 years ago
BPO 45338
Nosy @rhettinger, @sweeneyde

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['type-feature', 'library', '3.11'] title = 'Add key argument to collections.Counter' updated_at = user = 'https://bugs.python.org/kubataytekin' ``` bugs.python.org fields: ```python activity = actor = 'kubataytekin' assignee = 'none' closed = True closed_date = closer = 'kubataytekin' components = ['Library (Lib)'] creation = creator = 'kubataytekin' dependencies = [] files = [] hgrepos = [] issue_num = 45338 keywords = [] message_count = 5.0 messages = ['403002', '403027', '403030', '403031', '403033'] nosy_count = 3.0 nosy_names = ['rhettinger', 'Dennis Sweeney', 'kubataytekin'] pr_nums = [] priority = 'normal' resolution = 'duplicate' stage = 'resolved' status = 'closed' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue45338' versions = ['Python 3.11'] ```

4a3ab368-64a6-4762-98c3-8a07ecbd1524 commented 3 years ago

Counter has become the default tool for counting occurences of objects in an iterable. However by construction it only works on hashable objects as it is a subclass of dict. Would it be possible to implement a "key=" keyword argument as in sort etc.?

Here's a stackoverflow question wherein either map or implementing a hash method was proposed: https://stackoverflow.com/questions/69401713/elegant-way-of-counting-items-in-a-list-by-enum-value-in-python

sweeneyde commented 3 years ago

How is Counter(numbers, key=abs) any better than Counter(map(abs, numbers))?

It seems to me that "apply a function to each thing" (map) and "count the numbers of each thing" (Counter) are two orthogonal concepts, and there's no reason to entangle them. Their composition as "count the number of things with each function value" is probably better as a composition of two simple things rather than lumped together. I believe this is related to why PEP-455 was rejected.

4a3ab368-64a6-4762-98c3-8a07ecbd1524 commented 3 years ago

I was thinking about use cases where one might want to preserve the original key. I was not however aware of PEP-455 and the reasons for it's rejection. My bad.

Yet It is still unclear to me what the best practice would be to reverse lookup unhashable objects. Say you have a list of objects from which you need the most common. If there is no good way of reversing the transform, your only option is iterating trough your list? Which defeats the purpose of using a Counter and you'd be better off counting explicitly. Maybe I'm imagining unrealistic/far-fetched use cases.

sweeneyde commented 3 years ago

The values of a Counter are generally integers, not lists. Maybe you want:

    items_by_keyfunc = defaultdict(list)
    for x in all_the_items:
        items_by_keyfunc[keyfunc(x)].append(x)

Then items_by_keyfunc[42] is a list of the things with key 42.

Although I believe there have been proposals about adding some method to dict() to do basically the for-loop above.

4a3ab368-64a6-4762-98c3-8a07ecbd1524 commented 3 years ago

Or keep the reverse mapping in another dict:

count = Counter()
reverse_mapping = dict()
for x in all_the_items:
    count.update(keyfunc(x))
    reverse_mapping[keyfunc(x)] = x

Anyways I'm closing it as it is a partial duplicate of key-transforming dictionaries.