scandum / quadsort

Quadsort is a branchless stable adaptive mergesort faster than quicksort.
The Unlicense
2.13k stars 105 forks source link

Comparison to Timsort and Block sort #2

Closed dumblob closed 4 years ago

dumblob commented 4 years ago

There is TimSort (used by Python) which looks a bit similar in the ideas (based on merge sort), SLOC count, stability and probably performance to quadsort.

Dare to make a comparison? You can take CPython implementation for this (or if you're an adventurous type, implement it based on Julia TimSort as it's well readable unlike other implementations I know of).

Other potential candidate(s) for comparison would be "Block sort" (e.g. https://github.com/BonzaiThePenguin/WikiSort/tree/master or https://github.com/Mrrl/GrailSort which I slightly prefer over WikiSort). A slightly biased (read "not based on measurements") opinion on WikiSort gives the author of Timsort.

scandum commented 4 years ago

qsort() is the function to beat, anyone not up to that challenge is playing for 2nd place, or as of this week, 3rd place.

dumblob commented 4 years ago

Well, Timsort (and Block sort kinda as well) is "like" qsort. So, challenge accepted?

scandum commented 4 years ago

What I'm trying to say is that it took me 4 years to write a sorting algorithm that could beat qsort() on equal terms.

It's like I made the highest jump on Earth and people insist it's not valid unless I do it on the Moon as well. I hope you can understand it seems a little surreal to me.

dumblob commented 4 years ago

What I'm trying to say is that it took me 4 years to write a sorting algorithm that could beat qsort() on equal terms.

That's fine and it's good to beat qsort(). On the other hand, qsort() is well known to have several deficiencies which other sorting algorithms (incl. Timsort) try to solve. In other words, beating qsort() on equal terms is for almost two decades not any more the goal when designing a sorting algorithm, because there are other and harder challenges.

Judging based on https://www.reddit.com/r/programming/comments/f3d5q0/quadsort_introduction_to_a_new_stable_sorting/fhl8ww9/ (though the "benchmark" overall incl. its tiny magnitude is still way too simplistic to make a better overview), Quadsort is not the fastest one in some cases by a solid margin (but faster by about the same margin in other cases) among single-threaded sorting algorithms. Unfortunately Timsort is not there even though it user very similar ideas to speed up the canonical merge sort.

Btw. as others have (somewhat incorrectly) pointed out, that you should use specialization/templates/similar techniques and thus a different language than C to avoid pointer passing issues (dereference is slow as it basically "quasi-randomly" jumps over memory which is a constant factor, but worse it's highly cache inefficient even for millions of items), I actually think your choice was very good and you should definitely stick with C, but start using its full potential by using C99 restrict for all pointers (and if not possible, then make it possibly by passing compound data rather than "deconstructed/unwrapped" data/structs) and of course inline where it makes sense. Without restrict there is too many unnecessary memory loads and stores.

DrGo commented 4 years ago

I am getting a split decision on a macos Catalina, compiled using Apple clang version 11.0.0 (clang-1100.0.33.17)

➜ quadsort git:(master) ./a.out quadsort: sorted 1000000 elements in 0.155557 seconds. (random order) qsort: sorted 1000000 elements in 0.144936 seconds. (random order)

     quadsort: sorted 1000000 elements in 0.005193 seconds. (forward order)
        qsort: sorted 1000000 elements in 0.006011 seconds. (forward order)

     quadsort: sorted 1000000 elements in 0.040907 seconds. (reverse order)
        qsort: sorted 1000000 elements in 0.030135 seconds. (reverse order)

     quadsort: sorted 1000000 elements in 0.045456 seconds. (random tail)
        qsort: sorted 1000000 elements in 0.081248 seconds. (random tail)

➜ quadsort git:(master) ./a.out quadsort: sorted 1000000 elements in 0.155336 seconds. (random order) qsort: sorted 1000000 elements in 0.145207 seconds. (random order)

     quadsort: sorted 1000000 elements in 0.005184 seconds. (forward order)
        qsort: sorted 1000000 elements in 0.006011 seconds. (forward order)

     quadsort: sorted 1000000 elements in 0.040963 seconds. (reverse order)
        qsort: sorted 1000000 elements in 0.030163 seconds. (reverse order)

     quadsort: sorted 1000000 elements in 0.045405 seconds. (random tail)
        qsort: sorted 1000000 elements in 0.081733 seconds. (random tail)
scandum commented 4 years ago

You'll get somewhat different results for each compiler / system.

scandum commented 4 years ago
Qsort 100000 unknown_32                    86       11657430.2 ns/op MO: 148609837
Quadsort 100000 unknown_32                 94       10713883.0 ns/op MO: 152641767
Introsort 100000 unknown_32                93       10868172.0 ns/op MO: 176354336
Merge_sort 100000 unknown_32               89       11298089.9 ns/op MO: 147022229
Heap_sort 100000 unknown_32                88       11415965.9 ns/op MO: 283323456
Shell_sort 100000 unknown_32               53       19063754.7 ns/op MO: 135416015
Tim_sort 100000 unknown_32                 77       13039363.6 ns/op MO: 134388364
Merge_sort_in_place 100000 unknown_32      81       12414308.6 ns/op MO: 145139842
Grail_sort 100000 unknown_32               68       14830073.5 ns/op MO: 110715760
Sqrt_sort 100000 unknown_32                84       11915619.0 ns/op MO: 143224346
Grail_sort_dyn_buffer 100000 unknown_32    79       12745696.2 ns/op MO: 128632770

Above is a benchmark against a timsort and introsort implementation. I had to make several changes:

The original implementation was in-lining the comparisons, which is one way to beat qsort(), but not a valid one. The original implementation was using lrand48() which I changed to rand() which should have better entropy.

MO: lists the number of comparisons. This in itself isn't the full equation because the way the algorithm moves data is equally important, but more difficult to measure.

While the Introsort is fast it does have notably more comparisons. So it's faster when inlining and comparing integers, but only for random data, besides that it is not stable and may have DOS vulnerabilities.

Quadsort for some reason is much faster than Introsort on arrays under 1000 elements, I'm still trying to figure out why. Might be cache related.

Grailsort and Timsort look interesting for applications with very heavy comparisons, but I may not be catching every comparison.

Artoria2e5 commented 6 months ago

qsort() is a moving goal, because it's not required to be quicksort.

That said, beating glibc's mergesort while being stable is all quad needs to be a reasonable replacement of it. It's even stable without extra heap allocation!

scandum commented 6 months ago

Interesting how future qsort() updates are now pretty much forced to be stable for backward compatibility.

There's nothing that stops glibc from incorporating quadsort or blitsort for 8 byte types, but I haven't bothered to personally invest the time as there's no guarantee it'll pass the bureaucratic hurdles.