scandum / wolfsort

Wolfsort is a stable adaptive hybrid radix / merge sort.
The Unlicense
192 stars 9 forks source link

comparison with sorty #1

Closed jfcg closed 4 years ago

jfcg commented 4 years ago

In order to compare with sorty I make the following change in bench.c:

int max = 1 << 26;

and compile with:

g++ -O3 -fpermissive bench.c -o bench

It is been running for over 5 minutes with no output. an issue with the benchmark code?

jfcg commented 4 years ago

I also added

int samples = 2;

and the output is

|      Name |    Items | Type |     Best |  Average | Comparisons |     Distribution |
| --------- | -------- | ---- | -------- | -------- | ----------- | ---------------- |
|  quadsort | 67108864 |  i64 | 7.294079 | 7.340462 |             |     random order |
|stablesort | 67108864 |  i64 | 7.693744 | 7.820621 |             |     random order |
|  wolfsort | 67108864 |  i64 | 9.194093 | 9.203110 |             |     random order |

|  quadsort | 67108864 |  i32 | 6.934483 | 7.032585 |             |     random order |
|stablesort | 67108864 |  i32 | 7.314444 | 7.478966 |             |     random order |
|  wolfsort | 67108864 |  i32 | 4.174401 | 4.180588 |             |     random order |

with much faster results for partially sorted input. sorty attains 3.19 seconds with two goroutines. Considering g++ -O3 vs go -B, g++ is considerably better at optimizing code but go toolchain is not bad either.

I think if you write a concurrent version of wolfsort, that should be even faster. Cheers

scandum commented 4 years ago

Keep in mind that wolfsort performs optimal at 1,000 - 100,000 elements.

A concurrent version should be quite a bit faster. For the best overall results one would need to make quadsort concurrent as well. I don't think I'll attempt this myself anytime soon however.

jfcg commented 4 years ago

Keep in mind that wolfsort performs optimal at 1,000 - 100,000 elements.

interesting that performance peeks in an interval. is it by design?

scandum commented 4 years ago

I guess you could call it by design. Quadsort is significantly faster than quicksort as long as the L1 cache isn't exhausted. This is primarily due to the swapping optimization which gives a 50% performance boost.

So the primarily goal of wolfsort is to create buckets that contain right around 250 elements. This is easiest in the 32K - 128K range, which might explain why wolfsort is (in some of my tests) faster at sorting 100K elements than 10K elements.