Implement Yaroslavskiy-Bentley-Bloch Quicksort.

rust-ndarray / ndarray-stats

Statistical routines for ndarray

https://docs.rs/ndarray-stats

Apache License 2.0

201 stars 25 forks source link

Implement Yaroslavskiy-Bentley-Bloch Quicksort. #80

Closed n3vu0r closed 1 year ago

n3vu0r commented 3 years ago

This is a dual pivot 3-way quick sort which is less likely to run into worst cases which can end in stack overflows for large arrays.

The previous single pivot 2-way quick sort used by the FreedmanDiaconis strategy overflows its stack with this NPY array.

n3vu0r commented 3 years ago

I will test also for same elements and refactor this PR as soon as I have time.

n3vu0r commented 3 years ago

Since we keep the single-pivot partitioning method, I've implemented a variant of Sesquickselect. It suggests the optimal pivot selection from fixed size samples wrt to the relative sought rank index / length and also switches from dual-pivot to single-pivot partitioning for extreme sought ranks (page 17, figure 3). The benches show speed up with adaptive pivot sampling and with smaller recursion cutoff thresholds, at least on my machine. For the bulk version, I kept the recommended skewed pivots for Quicksort in my assumption that multiple indexes change the characteristics from Quickselect towards Quicksort (and there is no single sought rank).

It works well with equal element arrays and with sorted and reversely sorted arrays. Tested up to 1_000_000 elements.

I would suggest to make sample_mut() generic over the sample size via const_generics if an MSRV of 1.51 is fine?

The sampling does not have to be equally spaced, could also be randomized. I have no favorite yet.

n3vu0r commented 3 years ago

I used adaptive pivot sampling for the bulk version as well but only for branches with a single index remaining. I dunno how the benches are configured but would like to add larger input arrays.