Reproduce speedup from powersort in PyPy

sebawild commented 2 years ago

Since we plan to use PyPy for running time experiments in our paper, we should check that the findings that Willem made in CPython are roughly reflected in PyPy, in particular that when Powersort substantially saves in mergecost, we also see a reduction of running time.

For that we need to:

Install a reasonably recent version of PyPy (I don't think it is vital to have the very latest at this point).
Copy out the sorting method from the PyPy code (old and new) to get their Timsort and Powersort implementation (ref'ed in #19 ).
Run the little script Willem wrote that reads the long list sorted in the pyflate benchmark, but use the Timsort and Powersort implementations from above (in separate runs) to sort the list. I prepare the list in here, but one has to replace sorted by the call to Timsort resp. Powersort.
Time the runs using timeit; see comments in the script above and the docs for more parameters.
Hopefully find similar results as in CPython (8% less mergecost with Powersort led to 3% faster in time).

Note: In PyPy, we should always run code a few thousand times before starting with the actual measurement since the just-in-time compiler only kicks in once a method was used sufficiently often.

sebawild commented 2 years ago

Running this on the pypy on Ubuntu prints

WARNING: timeit is a very unreliable tool. use pyperf or something else for real measurements
pypy -m pip install pyperf
pypy -m pyperf timeit -v -s 'import pyflate_list' 'pyflate_list.sort_pyflate_list()'

So maybe we want to do that instead ;)

sebawild commented 2 years ago

(updated comment in the script)

sebawild commented 2 years ago

Quick note: running the commands on my laptop (using the builtin sorted), CPython (14.8 ms +- 0.8 ms) actually comes out slightly faster than PyPy (15.7 ms +- 1.0 ms)! That is opposite of the typical outcome for that comparison, but is not totally unexpected as simple sequential code without much memory allocation tends to be fast in C.

roarin-roran commented 2 years ago

this seems more like a sprint than an issue - but the really detailed notes here give me everything I need to know to get on with it. I'll probably continue with the current sprint until I've got some coding momentum/we meet/it's finished (we can discuss which), after which point I'll break this down into several issues with a milestone and prioritise it.

sound good?

sebawild commented 2 years ago

Hmm, maybe try to this asap. If there are large differences in behavior to CPython, we have to change plans.

roarin-roran commented 2 years ago

@sebawild is bullet 2 here identical with #19? if not, what are the differences (ignore if moot by the time I get to it - I know that you're working on this sprint rn)

sebawild commented 2 years ago

It's probably the same

roarin-roran / Sorting

Reproduce speedup from powersort in PyPy #30