orlp / pdqsort

Pattern-defeating quicksort.
zlib License
2.37k stars 102 forks source link

pdqsort is slower than qsort when compare function is slow #20

Closed mame closed 2 years ago

mame commented 2 years ago

I tried to use pdqsort as an implementation of Array#sort method for the programming language Ruby. Unfortunately, it was about twice slower than the current implementation (based on glibc's qsort(3)) against random input.

It seems that pdqsort tends to call compare function more often than qsort(3). Here is a self-contained demonstration:

$ g++ -O3 -o cmp-test cmp-test.cpp

$ ./cmp-test qsort
1536197

$ ./cmp-test pdqsort
1861162

$ ./cmp-test std::sort
1978708
#include <vector>
#include "pdqsort.h"

#define N 100000

int counter;

int cmp(const void *a, const void *b) {
    long c = *(long*)a - *(long*)b;
    counter++;
    return c < 0 ? -1 : c > 0 ? 1 : 0;
}

int main(int argc, char *argv[]) {
    std::vector<long> vec;

    // reproducible random input
    long x = 1;
    for (int i = 0; i < N; i++) {
        vec.push_back(x);
        x = (48271 * x) & 0xffffffff;
    }

    counter = 0;

    std::string mode(argv[1]);

    if (mode == "qsort") {
        qsort(vec.data(), vec.size(), sizeof(long), cmp);
    }
    else if (mode == "pdqsort") {
        pdqsort(vec.begin(), vec.end(), [] (const auto &a, const auto &b) {
                counter++;
                return a < b;
        });
    }
    else if (mode == "std::sort") {
        std::sort(vec.begin(), vec.end(), [] (const auto &a, const auto &b) {
                counter++;
                return a < b;
        });
    }

    // test
    //for (int i = 0; i < N; i++) {
    //    printf("%ld\n", vec[i]);
    //}

    printf("%d\n", counter);
}

Because Ruby's compare function is slow (it invokes Ruby-level method for each comparison), qsort(3) implementation is faster.

Is this a known characteristic of pdqsort? I know that there is no perfect sorting algorithm in every respect, but I'd like to report this just for the possibility that it may be an unknown room for improvement. If it is known, I'm sorry, feel free to close this issue. If there is a way to reduce the number of comparison at the expense of some speed, I'd be glad to hear about it.

orlp commented 2 years ago

The lower comparison count can be explained because qsort chooses to switch to insertion sort at 4 elements whereas pdqsort (currently) does it at 24. For cheap comparison functions doing more work in insertion sort is better.

You can try tweaking insertion_sort_threshold to 12, which was optimal for comparisons in my previous research.

As for why it's so much slower as you report, I don't know. Are you benchmarking with optimizations turned on (-O2)? My suspicion is that it all lies in the call overhead into Ruby, to be honest.

mame commented 2 years ago

Thank you so much for your quick reply! And I'm VERY sorry:

Are you benchmarking with optimizations turned on (-O2)?

This was it. I actually passed -O3 to gcc (because Ruby is written in C), but forgot to add it to g++.

After I added -O3 to g++, the performance is almost same in the case of random input. If the input is already sorted, pdqsort implementation is 3 times faster than qsort. Awesome.

I'll try to port pdqsort to C and evaluate it in Ruby more seriously. Thank you for the great algorithm!