onera / std_e

C++ standard library extension
Mozilla Public License 2.0
4 stars 0 forks source link

sorting algorithm ? #2

Closed MicK7 closed 1 year ago

MicK7 commented 2 years ago

Does std_e provide a kind of efficient timsorter: https://en.wikipedia.org/wiki/Timsort https://saurabhsoodweb.wordpress.com/2017/04/18/parallelizing-timsort/

BerengerBerthoul commented 2 years ago

No, not at the moment. It only has a fix of std::sort to work with proxy references, implemented as a quicksort (so, not even with the introsort worst-case enhancement). The fix is needed to correct this gcc bug.

If you are looking for a pattern-defeating sort, maybe this one would be of interest. If you are looking for a faster quicksort, this could be of interest. (Note: I have not tested these myself) I am not familiar with TimSort but it seems to be used mostly with languages where the data is sorted indirectly. In particular, wikipedia says :

It is advantageous over Quicksort for sorting object references or pointers because these require expensive memory indirection to access data and perform comparisons and Quicksort's cache coherence benefits are greatly reduced.

Which makes me wonder if Timsort is well-suited for C++ (where data is sorted directly most of the time).

If you are looking for a distributed parallel sort, then you can use std_e::sort_by_rank (on the devbranch, examples here). std_e::sort_by_rank is a partial sort that ensures that data is sorted rank by rank. Inside a rank, you would still have to sort your data, but then you can use any sequential sort algorithm (with each rank calling it in parallel). The main interest is that it takes care of load balancing. The main limitations are the following :

Can you tell me a bit more about your use case :

MicK7 commented 2 years ago

Thanks for the quick reply. My use case was to sort a list of string (zone names, or flow solution names) and thus the pattern-defeating feature is nice to have as well as stability.

For the parallelism, it is not really needed, but I was wondering how I would do it I my vector/list of strings was spread over multiple MPI process.

I just know about the timsort because it is use in C python to sort list, dict and so on but the pdqsort seems very nice.

Sorry, I don't really understand the C++ question.