C accelerator for collections.Counter is slow

scoder commented 11 years ago

BPO	18594
Nosy	@rhettinger, @jcea, @scoder, @serhiy-storchaka, @phmc
Files	counterbench.py: Benchmark different methods of counting collections_Counter_without_C_accelerator_with_comprehension.patch: collections.Counter without C accelerator patch performance_comparision.csv: performance comparison with and without C accelerator fix_counter.diff: Repair the test for choosing the fast path fix_counter2.diff: Make the slow path match the pure Python version.

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = 'https://github.com/rhettinger' closed_at = created_at = labels = ['library', 'performance'] title = 'C accelerator for collections.Counter is slow' updated_at = user = 'https://github.com/scoder' ``` bugs.python.org fields: ```python activity = actor = 'python-dev' assignee = 'rhettinger' closed = True closed_date = closer = 'rhettinger' components = ['Library (Lib)'] creation = creator = 'scoder' dependencies = [] files = ['31085', '31774', '31775', '31918', '31932'] hgrepos = [] issue_num = 18594 keywords = ['patch'] message_count = 14.0 messages = ['193914', '193928', '193929', '197806', '197807', '198675', '198683', '198685', '198712', '198753', '198756', '198757', '198817', '198971'] nosy_count = 7.0 nosy_names = ['rhettinger', 'jcea', 'scoder', 'python-dev', 'serhiy.storchaka', 'pconnell', 'Anoop.Thomas.Mathew'] pr_nums = [] priority = 'normal' resolution = 'fixed' stage = 'patch review' status = 'closed' superseder = None type = 'performance' url = 'https://bugs.python.org/issue18594' versions = ['Python 3.3', 'Python 3.4'] ```

scoder commented 11 years ago

The C accelerator for the collections.Counter class (_count_elements() in _collections.c) is slower than the pure Python versions for data that has many unique entries. This is because the fast path for dicts is not taken (Counter is a subtype of dict) and the slower fallback path raises exceptions for each value that wasn't previously seen. This can apparently make it slower than calling get() on Python side.

My suggestion is to drop the fallback path from the accelerator completely and to only call the C function when it's safe to use it, e.g. when "type(self) is Counter" and not a subclass.

serhiy-storchaka commented 11 years ago

Are there any cases where the counter class with the C accelerator is faster than the pure Python version? Here is a benchmarking script (modification of Roy Smith's script [1]) and looks as the pure Python version is faster even for data that has not many unique entries.

[1] http://permalink.gmane.org/gmane.comp.python.general/738820

786d3f11-b763-4414-a03f-abc264e0b72d commented 11 years ago

That sounds like a good idea, Stefan.

22de11fb-28aa-462c-8bb3-54a789c41c26 commented 10 years ago

40% faster collections.Counter() . Removed C accelerator. Patch attached. Passes all tests. Results comparison follows.

22de11fb-28aa-462c-8bb3-54a789c41c26 commented 10 years ago

Performance comparison with and without patch applied.

rhettinger commented 10 years ago

A C-accelerator should ALWAYS be able to beat a pure python version if it does the same steps but without the overhead of the eval-loop. And in special cases such as type(self)==Counter, it can do much better.

To resolve this report, the C accelerator needs to be fixed. As Stefan pointed-out, the fast-path isn't being triggered because of PyDict_CheckExact test. And, the fallback path can be sped-up as well (by doing the same steps in C as are being done with the pure python code).

rhettinger commented 10 years ago

Repaired version ----------------

$ py -m timeit -s 'from random import seed, randrange; seed(8675309); data=[randrange(1000) for i in range(100000)]; from collections import Counter'  'Counter(data)'
100 loops, best of 3: 14.3 msec per loop
$ py -m timeit -s 'from random import seed, randrange; seed(8675309); data=[randrange(500000) for i in range(100000)]; from collections import Counter'  'Counter(data)'
10 loops, best of 3: 40.8 msec per loop

Current with accelerator ------------------------

$ py -m timeit -s 'from random import seed, randrange; seed(8675309); data=[randrange(1000) for i in range(100000)]; from collections import Counter'  'Counter(data)'
10 loops, best of 3: 61.7 msec per loop
$ py -m timeit -s 'from random import seed, randrange; seed(8675309); data=[randrange(500000) for i in range(100000)]; from collections import Counter'  'Counter(data)'
10 loops, best of 3: 118 msec per loop

Current without accelerator ---------------------------

$ py -m timeit -s 'from random import seed, randrange; seed(8675309); data=[randrange(1000) for i in range(100000)]; from collections import Counter'  'Counter(data)'
10 loops, best of 3: 54.9 msec per loop
$ py -m timeit -s 'from random import seed, randrange; seed(8675309); data=[randrange(500000) for i in range(100000)]; from collections import Counter'  'Counter(data)'
10 loops, best of 3: 80.8 msec per loop

scoder commented 10 years ago

Patch LGTM and seems to work well, according to your numbers.

Only minor nitpick would be that the method references could be decref-ed earlier, but that would complicate the code a bit.

serhiy-storchaka commented 10 years ago

Benchmarking results look great.

But isn't _PyObject_LookupSpecial() more appropriate function for special methods lookup than PyObject_GetAttrString()?

1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 10 years ago

New changeset 6aef095fdb30 by Raymond Hettinger in branch '3.3': Issue bpo-18594: Fix the fast path for collections.Counter(). http://hg.python.org/cpython/rev/6aef095fdb30

rhettinger commented 10 years ago

Attaching a patch for the slow path. Makes the code exactly match the pure python version. This kicks in whether someone has subclassed Counter and overridden either __getitem or __setitem.

scoder commented 10 years ago

Can you update the benchmark numbers to show what the difference is compared to pure Python (and to the fastpath) now?

One more thing: the fastpath depends on .__getitem() and friends, whereas the fallback path depends on .get(). What if someone overrides .get() but not .__getitem()? (Might be a hypothetical case...)

1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 10 years ago

New changeset 1ee6f8a96fb9 by Raymond Hettinger in branch '3.3': Issue bpo-18594: Fix the fallback path in collections.Counter(). http://hg.python.org/cpython/rev/1ee6f8a96fb9

1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 10 years ago

New changeset e4cec1116e5c by Raymond Hettinger in branch '3.3': Issue bpo-18594: Make the C code more closely match the pure python code. http://hg.python.org/cpython/rev/e4cec1116e5c

python / cpython

C accelerator for collections.Counter is slow #62794