python / cpython

The Python programming language
https://www.python.org/
Other
60.03k stars 29.06k forks source link

Adaptive stable mergesort #36940

Closed tim-one closed 21 years ago

tim-one commented 21 years ago
BPO 587076
Nosy @mwhudson, @gvanrossum, @tim-one, @smontanaro, @nascheme
Files
  • merge.patch: adds .mosrt() and .hsort()
  • merge4.patch: Version 4 of the listobject.c patch
  • timsort.txt: Plain text doc file
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/tim-one' closed_at = created_at = labels = ['interpreter-core'] title = 'Adaptive stable mergesort' updated_at = user = 'https://github.com/tim-one' ``` bugs.python.org fields: ```python activity = actor = 'tim.peters' assignee = 'tim.peters' closed = True closed_date = None closer = None components = ['Interpreter Core'] creation = creator = 'tim.peters' dependencies = [] files = ['4449', '4450', '4451'] hgrepos = [] issue_num = 587076 keywords = ['patch'] message_count = 26.0 messages = ['40650', '40651', '40652', '40653', '40654', '40655', '40656', '40657', '40658', '40659', '40660', '40661', '40662', '40663', '40664', '40665', '40666', '40667', '40668', '40669', '40670', '40671', '40672', '40673', '40674', '40675'] nosy_count = 8.0 nosy_names = ['mwh', 'gvanrossum', 'tim.peters', 'skip.montanaro', 'anthonybaxter', 'nascheme', 'jacobs99', 'aimacintyre'] pr_nums = [] priority = 'normal' resolution = 'accepted' stage = None status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue587076' versions = ['Python 2.3'] ```

    tim-one commented 21 years ago

    This adds method list.msort([compare]).

    Lib/test/sortperf.py is already a sort performance test. To run it on exactly the same data I used, run it via

    python -O sortperf.py 15 20 1

    That will time the current samplesort (even after this patch). After getting stable numbers for that, change sortperf's doit() to say L.msort() instead of L.sort(), and you'll time the mergesort instead.

    CAUTION: To save time across many runs, sortperf saves the random floats it generates, into temp files.
    If those temp files already exist when sortperf starts, it reads them up instead of generating new numbers.
    As a result, it's important in the above to pass "1" as the last argument the *first* time you run sortperf -- that forces the random # generator into the same state it was when I used it.

    This patch also gives lists a new list.hsort() method, which is a weak heapsort I gave up on. Time it if you want to see how bad an excellent sort can get \<wink>.

    nascheme commented 21 years ago

    Logged In: YES user_id=35752

    AMD 1.4 Ghz Athon CPU L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) L2 Cache: 256K (64 bytes/line) Linux 2.4.19-pre10-ac1 gcc 2.95.4

    samplesort: i 2**i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 32768 0.06 0.01 0.01 0.07 0.01 0.03 0.01 0.07 16 65536 0.16 0.02 0.02 0.15 0.02 0.07 0.02 0.17 17 131072 0.37 0.03 0.03 0.39 0.04 0.16 0.04 0.41 18 262144 0.84 0.07 0.08 0.87 0.10 0.34 0.07 0.93 19 524288 1.89 0.16 0.16 1.97 0.21 0.70 0.16 2.08 20 1048576 4.20 0.33 0.34 4.55 0.41 1.45 0.34 4.61

    timsort: i 2**i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 32768 0.06 0.00 0.01 0.01 0.01 0.03 0.00 0.01 16 65536 0.14 0.02 0.02 0.02 0.02 0.06 0.02 0.04 17 131072 0.35 0.04 0.04 0.04 0.04 0.12 0.04 0.08 18 262144 0.79 0.08 0.08 0.09 0.09 0.27 0.09 0.16 19 524288 1.79 0.17 0.17 0.18 0.17 0.54 0.17 0.33 20 1048576 3.96 0.35 0.34 0.34 0.36 1.12 0.34 0.70

    tim-one commented 21 years ago

    Logged In: YES user_id=31435

    Wow! Thanks, Neil! That's impressive, even if I say so myself \<wink>.

    f62d94d0-e192-4823-b2e9-4cd18f067928 commented 21 years ago

    Logged In: YES user_id=459565

    Intel 1266 MHz Penguin III x2 (Dual processor) 512KB cache Linux 2.4.19-pre1-ac2 gcc 3.1 20020205

    samplesort: i 2**i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 32768 0.07 0.00 0.01 0.06 0.01 0.02 0.00 0.07 16 65536 0.16 0.02 0.01 0.15 0.01 0.06 0.02 0.17 17 131072 0.37 0.04 0.04 0.35 0.04 0.15 0.03 0.38 18 262144 0.84 0.07 0.08 0.80 0.09 0.31 0.07 0.86 19 524288 1.89 0.16 0.15 1.78 0.19 0.66 0.15 1.92 20 1048576 4.12 0.33 0.31 4.07 0.37 1.34 0.31
    4.22

    timsort: i 2**i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 32768 0.07 0.01 0.00 0.01 0.01 0.03 0.01 0.01 16 65536 0.17 0.01 0.02 0.01 0.02 0.06 0.02 0.04 17 131072 0.37 0.04 0.03 0.04 0.04 0.13 0.04 0.08 18 262144 0.84 0.07 0.07 0.08 0.08 0.27 0.07 0.16 19 524288 1.89 0.16 0.15 0.15 0.17 0.55 0.15 0.33 20 1048576 4.16 0.32 0.31 0.31 0.32 1.14 0.31
    0.66

    tim-one commented 21 years ago

    Logged In: YES user_id=31435

    Numbers from Marc-Andre Lemburg, "AMD Athlon 1.2GHz/Linux/gcc".

    samplesort i 2**i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 32768 0.07 0.00 0.01 0.09 0.01 0.03
    0.01 0.08 16 65536 0.18 0.02 0.02 0.19 0.03 0.07
    0.02 0.20 17 131072 0.43 0.05 0.04 0.46 0.05 0.18
    0.05 0.48 18 262144 0.99 0.09 0.10 1.04 0.13 0.40
    0.09 1.11 19 524288 2.23 0.19 0.21 2.32 0.24 0.83
    0.20 2.46 20 1048576 4.96 0.40 0.40 5.41 0.47 1.72
    0.40 5.46

    samplesort again (run twice by mistake)

    i 2**i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 32768 0.08 0.01 0.01 0.09 0.01 0.03
    0.00 0.09 16 65536 0.20 0.02 0.01 0.20 0.03 0.07
    0.02 0.20 17 131072 0.46 0.06 0.02 0.45 0.05 0.20
    0.04 0.49 18 262144 0.99 0.09 0.10 1.09 0.11 0.40
    0.12 1.12 19 524288 2.33 0.20 0.20 2.30 0.24 0.83
    0.19 2.47 20 1048576 4.89 0.40 0.41 5.37 0.48 1.71
    0.38 6.22

    timsort i 2**i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 32768 0.08 0.01 0.01 0.01 0.01 0.03
    0.00 0.02 16 65536 0.17 0.02 0.02 0.02 0.02 0.07
    0.02 0.06 17 131072 0.41 0.05 0.04 0.05 0.04 0.16
    0.04 0.09 18 262144 0.95 0.10 0.10 0.10 0.10 0.33
    0.10 0.20 19 524288 2.17 0.20 0.21 0.20 0.21 0.66
    0.20 0.44 20 1048576 4.85 0.42 0.40 0.41 0.41 1.37
    0.41 0.84

    tim-one commented 21 years ago

    Logged In: YES user_id=31435

    Pentium III, 866 MHz, 16KB L1 D-cache, 16KB L1 I- cache, 256KB L2 cache, Win98SE, MSVC 6

    samplesort i 2**i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 32768 0.17 0.01 0.01 0.17 0.01 0.05
    0.01 0.11 16 65536 0.24 0.02 0.02 0.25 0.02 0.08
    0.02 0.24 17 131072 0.53 0.05 0.04 0.49 0.05 0.18
    0.04 0.52 18 262144 1.16 0.09 0.09 1.06 0.12 0.37
    0.09 1.14 19 524288 2.53 0.18 0.17 2.30 0.24 0.75
    0.17 2.47 20 1048576 5.48 0.37 0.35 5.17 0.45 1.51
    0.35 5.34

    timsort i 2**i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 32768 0.15 0.03 0.02 0.02 0.01 0.04
    0.01 0.02 16 65536 0.23 0.02 0.02 0.02 0.02 0.09
    0.02 0.04 17 131072 0.53 0.04 0.04 0.05 0.04 0.19
    0.04 0.09 18 262144 1.16 0.09 0.09 0.10 0.09 0.38
    0.09 0.19 19 524288 2.54 0.18 0.17 0.18 0.18 0.78
    0.17 0.36 20 1048576 5.50 0.36 0.35 0.36 0.37 1.60
    0.35 0.73

    smontanaro commented 21 years ago

    Logged In: YES user_id=44345

    Pentium III, 450MHz, 256KB L2 cache, Mandrake Linux 8.1, gcc 2.96

    L.sort():

    i 2**i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 32768 0.32 0.02 0.03 0.30 0.03 0.09 0.03 0.32 16 65536 0.73 0.06 0.05 0.66 0.06 0.20 0.05 0.71 17 131072 1.53 0.11 0.12 1.42 0.13 0.44 0.11 1.51 18 262144 3.28 0.21 0.21 3.09 0.28 0.89 0.21 3.26 19 524288 7.05 0.44 0.42 6.60 0.59 1.81 0.42 7.03 20 1048576 15.30 0.90 0.86 14.10 1.13 3.62 0.86 14.96

    L.msort():

    i 2**i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 32768 0.32 0.02 0.03 0.03 0.02 0.13 0.02 0.05 16 65536 0.70 0.05 0.06 0.05 0.06 0.27 0.07 0.10 17 131072 1.53 0.09 0.11 0.10 0.11 0.59 0.10 0.21 18 262144 3.27 0.22 0.21 0.23 0.21 1.13 0.21 0.43 19 524288 7.10 0.43 0.45 0.44 0.45 2.27 0.43 0.88 20 1048576 15.03 0.86 0.87 0.87 0.89 4.70 0.89 1.74

    tim-one commented 21 years ago

    Logged In: YES user_id=31435

    Intrigued by a comment of McIlroy, I tried catenating all the .c files in Objects and Modules, into one giant file, and sorted that. msort got a 22% speedup there, suggesting there's *some* kind of significant pre-existing lexicographic order (and/or reverse order) in C source files that msort is able to exploit.

    Trying it again on about 1.33 million lines of Python-Dev archive (including assorted uuencoded attachmets). msort got a 32% speedup.

    I'm not sure what to make of that, but we needed some real life data here \<wink>.

    tim-one commented 21 years ago

    Logged In: YES user_id=31435

    I attached timsort.txt, a plain-text detailed description of the algorithm. After I dies, it's the only clue that will remain \<wink>.

    1a90d5ba-e091-4ed6-a644-41c5568983da commented 21 years ago

    Logged In: YES user_id=29957

    Sun Ultra 5, gcc 2.95.2, 512M ram, sunos 5.7.

    (sort) imperial% ./python -O Lib/test/sortperf.py 15 20 1 i 2**i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 32768 0.29 0.03 0.02 0.29 0.03 0.09 0.02 0.31 16 65536 0.66 0.05 0.05 0.68 0.05 0.20 0.05 0.71 17 131072 1.50 0.11 0.11 1.51 0.12 0.47 0.11 1.60 18 262144 3.25 0.23 0.22 3.37 0.25 1.18 0.22 3.52 19 524288 6.88 0.45 0.43 7.30 0.51 1.91 0.43 7.43 20 1048576 14.90 0.92 0.88 15.49 1.05 3.89 0.90 16.04

    (timsort) imperial% ./python -O Lib/test/sortperf.py 15 20 1 i 2**i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 32768 0.28 0.02 0.02 0.03 0.02 0.13 0.02 0.05 16 65536 0.59 0.05 0.05 0.06 0.05 0.26 0.05 0.11 17 131072 1.33 0.10 0.09 0.11 0.11 0.54 0.10 0.21 18 262144 2.92 0.22 0.20 0.22 0.21 1.10 0.20 0.44 19 524288 6.33 0.44 0.42 0.43 0.43 2.21 0.41 0.90 20 1048576 13.56 0.89 0.85 0.84 0.87 4.51 0.87 1.82

    1a90d5ba-e091-4ed6-a644-41c5568983da commented 21 years ago

    Logged In: YES user_id=29957

    PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 2.96

    (samplesort) i 2**i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 32768 0.08 0.01 0.01 0.07 0.01 0.03 0.01 0.08 16 65536 0.18 0.02 0.02 0.17 0.02 0.06 0.01 0.19 17 131072 0.41 0.04 0.04 0.41 0.04 0.16 0.04 0.44 18 262144 0.93 0.09 0.08 0.90 0.10 0.33 0.08 0.97 19 524288 2.04 0.18 0.16 1.98 0.23 0.69 0.17 2.13 20 1048576 4.49 0.36 0.34 4.52 0.43 1.44 0.33 4.65

    (timsort) i 2**i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 32768 0.08 0.01 0.01 0.00 0.01 0.04 0.00 0.01 16 65536 0.18 0.02 0.02 0.02 0.01 0.07 0.02 0.04 17 131072 0.42 0.03 0.04 0.04 0.04 0.14 0.03 0.08 18 262144 0.95 0.08 0.08 0.09 0.08 0.30 0.07 0.17 19 524288 2.08 0.17 0.16 0.17 0.17 0.63 0.17 0.34 20 1048576 4.56 0.33 0.33 0.33 0.35 1.29 0.33 0.71

    PIII Mobile 1.2GHz, 512k cache, 256M, Redhat 7.2, gcc 3.0.4

    (samplesort) i 2**i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 32768 0.08 0.01 0.01 0.08 0.00 0.02 0.01 0.08 16 65536 0.18 0.01 0.02 0.18 0.01 0.06 0.02 0.19 17 131072 0.41 0.04 0.04 0.39 0.04 0.16 0.04 0.44 18 262144 0.94 0.08 0.08 0.91 0.10 0.33 0.07 0.95 19 524288 2.05 0.17 0.16 2.07 0.20 0.70 0.16 2.11 20 1048576 4.50 0.34 0.32 4.30 0.42 1.41 0.32 4.61

    (timsort) i 2**i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 32768 0.09 0.01 0.00 0.01 0.01 0.04 0.01 0.01 16 65536 0.18 0.02 0.02 0.02 0.01 0.07 0.02 0.04 17 131072 0.41 0.04 0.04 0.04 0.03 0.14 0.03 0.08 18 262144 0.93 0.08 0.07 0.08 0.08 0.31 0.08 0.16 19 524288 2.07 0.15 0.15 0.16 0.16 0.63 0.16 0.34 20 1048576 4.54 0.33 0.31 0.32 0.33 1.28 0.32 0.67

    tim-one commented 21 years ago

    Logged In: YES user_id=31435

    Deleting old doc file.

    tim-one commented 21 years ago

    Logged In: YES user_id=31435

    Adding new doc file.

    tim-one commented 21 years ago

    Logged In: YES user_id=31435

    Adding new patch, merge2.patch. Most of this is semantically neutral compared to the last version -- more asserts, better comments, minor code fiddling for clarity, got rid of the weak heapsort.

    There is one useful change, extracting more info out of the pre-merge "find the endpoints" searches. This helps "in theory" most of the time, but probably not enough to measure. In some odd cases it can help a lot, though. See Python-Dev for discussion. There's no strong reason to time this stuff again, if you already did it once (and thanks to those who did!).

    tim-one commented 21 years ago

    Logged In: YES user_id=31435

    Just van Rossum 400Mhz G4 PowerPC running MacOSX 10.1.5. original patch From an email report; I chopped the "n" column and removed some whitespace so it's easier to read on SF.

    L.sort()
     i *sort \sort /sort 3sort +sort ~sort =sort !sort
    15  0.28  0.03  0.02  0.29  0.03  0.10  0.02  0.31
    16  0.65  0.05  0.04  0.65  0.06  0.20  0.05  0.71
    17  1.47  0.11  0.12  1.53  0.13  0.50  0.10  1.54
    18  3.19  0.24  0.25  3.19  0.29  0.98  0.23  3.39
    19  6.96  0.52  0.48  7.11  0.55  2.00  0.45  7.48
    20 15.15  0.99  0.94 15.96  1.12  4.20  1.02 16.32
    
    L.msort()
     i *sort \sort /sort 3sort +sort ~sort =sort !sort
    15  0.31  0.03  0.02  0.02  0.03  0.11  0.02  0.04
    16  0.64  0.04  0.04  0.05  0.05  0.25  0.06  0.11
    17  1.42  0.14  0.13  0.10  0.12  0.51  0.12  0.20
    18  3.01  0.26  0.21  0.23  0.22  1.07  0.19  0.46
    19  6.54  0.51  0.44  0.47  0.45  2.17  0.45  0.90
    20 14.27  0.98  0.96  0.96  0.96  4.34  0.95  2.04
    tim-one commented 21 years ago

    Logged In: YES user_id=31435

    Dang! That little optimization introduced a subtle assumption that the comparison function is consistent. We can't assume that in Python (user-supplied functions can be arbitrarily goofy). Deleted merge2.patch and added merge3.patch to repair that.

    d0e06e0e-7db7-4806-8b80-f93bcb4150bb commented 21 years ago

    Logged In: YES user_id=250749

    The following results are from your original patch (the n column dropped for better SF display).

    System 1: Athlon 1.4Ghz, 256MB PC2100 RAM, OS2 v4 FixPack 12, EMX 0.9d Fix 4

    gcc 2.8.1 -O2 samplesort i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 0.07 0.01 0.01 0.07 0.01 0.03 0.02 0.08 16 0.18 0.02 0.01 0.18 0.02 0.08 0.01 0.20 17 0.41 0.04 0.04 0.43 0.05 0.18 0.04 0.46 18 0.93 0.09 0.10 1.00 0.10 0.39 0.10 1.05 19 2.08 0.18 0.20 2.34 0.23 0.81 0.20 2.36 20 4.69 0.37 0.40 5.02 0.47 1.68 0.40 5.28

    timsort i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 0.06 0.01 0.01 0.01 0.01 0.03 0.01 0.02 16 0.15 0.03 0.01 0.02 0.02 0.06 0.02 0.04 17 0.37 0.04 0.05 0.04 0.05 0.13 0.05 0.10 18 0.88 0.10 0.09 0.10 0.10 0.28 0.10 0.19 19 1.97 0.20 0.18 0.21 0.21 0.58 0.20 0.39 20 4.40 0.41 0.40 0.42 0.40 1.21 0.40 0.81

    gcc 2.95.2 -O3 samplesort i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 0.07 0.01 0.00 0.07 0.01 0.03 0.00 0.08 16 0.17 0.01 0.03 0.17 0.02 0.09 0.02 0.19 17 0.42 0.05 0.04 0.46 0.06 0.18 0.05 0.45 18 0.99 0.09 0.09 1.05 0.12 0.40 0.09 1.05 19 2.09 0.18 0.21 2.18 0.23 0.84 0.20 2.45 20 4.73 0.39 0.41 5.13 0.47 1.70 0.40 5.38

    timsort i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 0.10 0.01 0.01 0.01 0.01 0.04 0.01 0.01 16 0.18 0.02 0.01 0.03 0.02 0.07 0.03 0.03 17 0.37 0.06 0.05 0.04 0.05 0.14 0.04 0.09 18 0.91 0.10 0.10 0.10 0.10 0.27 0.09 0.20 19 1.97 0.21 0.21 0.20 0.20 0.59 0.19 0.40 20 4.31 0.44 0.40 0.44 0.40 1.21 0.40 0.82

    System 2: P5-166 SMP (2 CPU), 64MB 60ns FPM RAM, FreeBSD 4.4-RELEASE with a patch to re-enable CPU L1 caches (SMP BIOS issue) gcc 2.95.3 -O3 samplesort i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 0.73 0.06 0.05 0.74 0.07 0.23 0.05 0.77 16 1.60 0.12 0.12 1.66 0.13 0.48 0.12 1.71 17 3.54 0.26 0.24 3.55 0.27 1.05 0.25 3.74 18 7.63 0.52 0.51 7.73 0.58 2.12 0.50 8.05 19 16.38 1.04 1.01 17.03 1.15 4.28 1.01 17.17 20 34.94 2.09 2.02 35.04 2.37 8.62 2.02 36.58

    timsort i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 0.74 0.05 0.06 0.06 0.06 0.32 0.06 0.12 16 1.64 0.12 0.12 0.12 0.12 0.65 0.12 0.26 17 3.62 0.25 0.25 0.27 0.26 1.32 0.25 0.52 18 7.78 0.51 0.50 0.53 0.52 2.69 0.50 1.06 19 16.76 1.03 1.01 1.09 1.04 5.46 1.01 2.12 20 35.93 2.09 2.02 2.14 2.09 11.05 2.04 4.38

    System 3: 486DX4-100, 32MB 60ns FPM RAM, FreeBSD 4.4-RELEASE gcc 2.95.3 -O3 samplesort i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 2.62 0.21 0.21 2.61 0.24 0.83 0.21 2.71 16 5.73 0.45 0.44 5.75 0.48 1.71 0.44 5.94 17 12.46 0.90 0.88 12.34 1.00 3.70 0.89 13.00 18 27.15 1.82 1.80 27.12 2.17 7.59 1.80 28.10 19 57.22 3.77 3.68 59.52 4.41 15.40 3.66 59.62 20 126.80 7.96 7.80 127.63 9.58 32.72 7.46 134.45

    timsort i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 2.52 0.21 0.20 0.20 0.20 1.05 0.20 0.42 16 5.49 0.45 0.41 0.43 0.44 2.13 0.43 0.90 17 12.15 0.88 0.84 0.85 0.88 4.34 0.88 1.83 18 26.11 1.82 1.74 1.84 1.81 8.70 1.74 3.67 19 56.34 3.67 3.55 3.80 3.67 17.84 3.53 7.48 20 121.95 7.89 7.37 8.24 7.98 39.38 7.44 16.83

    NOTES:

    System 2 is just starting to swap in the i=20 case.

    System 3 starts to swap at i=18; at i=19, process:resident size is 2:1; at i=20, process:resident size is a bit over 4:1.

    mwhudson commented 21 years ago

    Logged In: YES user_id=6656

    On my iBook (600 MHz G3 with 384 megs of RAM, OS X 10.1.5):

    L.sort():

    i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 0.19 0.01 0.00 0.20 0.02 0.07 0.01 0.21 16 0.45 0.05 0.04 0.43 0.04 0.15 0.05 0.47 17 1.00 0.09 0.09 1.01 0.09 0.37 0.09 1.08 18 2.16 0.16 0.16 2.26 0.22 0.75 0.18 2.35 19 4.80 0.38 0.36 5.08 0.46 1.45 0.35 5.31 20 10.65 0.79 0.79 11.83 0.89 3.33 0.78 11.88

    L.msort():

    i *sort \sort /sort 3sort +sort \~sort =sort !sort 15 0.18 0.02 0.03 0.02 0.03 0.08 0.02 0.04 16 0.43 0.03 0.03 0.04 0.04 0.17 0.04 0.08 17 0.95 0.08 0.09 0.09 0.08 0.34 0.08 0.18 18 2.08 0.18 0.18 0.19 0.18 0.72 0.18 0.37 19 4.59 0.37 0.38 0.39 0.38 1.47 0.36 0.76 20 10.22 0.83 0.76 0.79 0.78 3.04 0.79 1.66

    I've run this often enough to believe they're typical (inc. .msort() beating .sort() on *sort and \~sort by a small margin).

    Looks like an unequivocal win on this box.

    tim-one commented 21 years ago

    Logged In: YES user_id=31435

    Deleting old doc file and merge3.patch; adding new doc file.

    tim-one commented 21 years ago

    Logged In: YES user_id=31435

    Adding merge4.patch; explanation to follow.

    tim-one commented 21 years ago

    Logged In: YES user_id=31435

    In Kevin's company database (see Python-Dev), there were 8 fields we could sort on.

    Two fields were strongly correlated with the order of the records as given, and msort() was >6x faster on those.

    Two fields were weakly correlated, and msort was a major win on those (>25% speedup).

    One field had many duplicates, with a highly skewed distribution. msort was better than 2x faster on that.

    But the rest (phone#, #employees, address) were essentially randomly ordered, and msort was systematically a few percent slower on those. That wouldn't have been remarkable, except that the percentage slowdown was a few times larger than the percentage by which msort did more comparisons than sort().

    I eventually figured out the obvious: the # of records wasn't an exact power of 2, and on random data msort then systematically arranged for the final merge to be between a run with a large power-of-2 size, and a run with the little bit left over. That adds a bunch of compares over perfectly balanced merges, plus O(N) pointer copies, just to get that little bit in place.

    The new merge4.patch repairs that as best as (I think) non- heroically possible, quickly picking a minimum run length in advance that should almost never lead to a "bad" final merge when the data is randomly ordered.

    In each of Kevin's 3 "problem sorts", msort() does fewer compares than sort() now, and the runtime is generally within a fraction of a percent. These all-in-cache cases still seem to favor sort(), though, and it appears to be because msort() does a lot more data movement (note that quicksorts do no more than one swap per compare, and often none, while mergesorts do a copy on every compare). The other 5 major-to-killer wins msort got on this data remain intact.

    The code changes needed were tiny, but the doc file changed a lot more.

    Note that this change has no effect on arrays with power-of- 2 sizes, so sortperf.py timings shouldn't change (and don't on my box). The code change is solely to compute a good minimum run length before the main loop begins, and it happens to return the same value as was hard-coded before when the array has a power-of-2 size.

    More testing on real data would be most welcome! Kevin's data was very helpful to me.

    tim-one commented 21 years ago

    Logged In: YES user_id=31435

    New doc file, with an intro at the start and a program at the end. Turns out that merge4.patch actually reversed the random-array #-of-comparisons advantage samplesort had enjoyed: it's now timsort that does 1-2% fewer comparisons on random arrays of random lengths.

    See the end of the file for why samplesort does 50% more comparisons on average for random arrays of length two \<wink>.

    Near the end of the new Intro section at the start, I suggest a couple experiments people might try on boxes where \~sort is much slower under timsort. That remains baffling, but the algorithm doesn't *do* much in that case, so someone on a box where it flounders could surely figure out why.

    tim-one commented 21 years ago

    Logged In: YES user_id=31435

    \~sort gets more mysterious all the time: the mystery now is why it's *not* much slower everywhere! Here are the exact # of compares \~sort does:

    i n sort msort %ch lg(n!) -- ------ ------- ------- ----- -------- 15 32768 130484 188720 44.63 444255 16 65536 260019 377634 45.23 954037 17 131072 555035 755476 36.11 2039137 18 262144 1107826 1511174 36.41 4340409 19 524288 2218562 3022584 36.24 9205096 20 1048576 4430616 6045418 36.45 19458756

    The last column is the information-theoretic lower bound for sorting random arrays of this size (no comparison-based algorithm can do better than than on average), showing that sort() and msort() are both getting a lot of good out of the duplicates. But sort()'s special case for equal elements is extremely effective on \~sort's specific data pattern, and msort just isn't going to get close to that (it does better than sort() on skewed distributions with lots of duplicates, though).

    The only thing I can think of that could transform what "should be" highly significant slowdowns into highly significant speedups on some boxes are catastrophic cache effects in samplesort. But knowing something about how both algorithms work \<wink>, that's not screaming "oh, of course".

    tim-one commented 21 years ago

    Logged In: YES user_id=31435

    Replaced the doc file. The new one contains more info comparing msort to sort. There's nothing more I want to do here, and it looks like everyone who might time this already did.

    Assigned to Guido for pronouncement. I recommend replacing list.sort() with this. The only real downside is the potential for requiring 2*N temp bytes; that (and everything else \<wink>) is discussed in the doc file.

    If this is accepted, another issue is whether to *advertise* that this sort is stable. Some people really want that, but requiring stability constrains implementations. Another possibility is to give lists two sort methods, one guaranteed stable and the other not, where in 2.3 CPython both map to this code.

    In no case do I want to keep both the samplesort and timsort implementations in the core -- one brain-busting sort implementation is quite enough. This one has many wonderful properties the samplesort hybrid lacks.

    gvanrossum commented 21 years ago

    Logged In: YES user_id=6380

    1. Go for it.

    2. Advertise it as an implementation feature.

    tim-one commented 21 years ago

    Logged In: YES user_id=31435

    This is checked in now, so closing this patch.