rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.99k stars 12.54k forks source link

select_nth_unstable has quadratic worst-case time complexity; docs claim it should be linear #102451

Closed qtow closed 1 year ago

qtow commented 1 year ago

The docs claim that select_nth_unstable is "O(n) worst-case", but the implementation exhibits O(n2) time complexity for some inputs. I'm not sure if this is a documentation error or an implementation error, but it's definitely one or the other.

The current implementation is a fairly normal Quickselect. Notably it only looks at a constant number of elements to determine a pivot, which cannot guarantee linear time complexity for quickselect. Median of medians must be used instead, which is different from the "median of medians" referenced in the code.

For actual proof that pathological cases exist, here's a playground link. It generates close to the worst possible input for a fixed length, and compares the time for that against the time for a random input. For a length of ~128k (the highest I can get the playground to go before it times out), the pathological input is around 1000 times slower than the random input.

Here's some graphs of runtime vs input length: ![time vs input length](https://user-images.githubusercontent.com/82917268/192922420-53344b08-9f02-439a-950b-0d5997559412.png) ![time vs input length, log-log](https://user-images.githubusercontent.com/82917268/192922454-ebed8f14-9169-445f-8a0a-a27d519165e5.png) Both show runtime in seconds vs slice length, with blue being random inputs and red being pathological inputs. The second is a log-log graph with some lines fit to the data. The slopes on the log-log graph are roughly 1.3 and 2.3 for the blue and red lines respectively.

To uphold the linear worst-case guarantee, select_nth_unstable would need to use Median of medians. Introselect is the hybrid version, and Fast Deterministic Selection is the most recent work I'm aware of that looks at optimizing median of medians.

If the worst-case guarantee is relaxed to an average-case guarantee (that's all that C++ gives for nth_element), I think the docs should make it clear that the worst case is quadratic.

orlp commented 1 year ago

I have a much simpler proof of concept that shows the quadratic behavior (playground):

use std::cmp::Ordering;
use std::cell::Cell;
use std::sync::atomic::{self, AtomicUsize};

static NUM_SOLIDS: AtomicUsize = AtomicUsize::new(0);

// Gas is always greater than Solid.
#[derive(Copy, Clone, PartialOrd, Ord, PartialEq, Eq, Debug)]
enum Value {
    Solid(usize),
    Gas,
}

// When comparing two EvilValue's it will always give a consistent answer, but
// for some 'mysterious' reason new elements always appear to be greater than
// previously seen ones. This is done by starting all values as Gas and only
// solidifying when necessary: Gas <-> Gas comparisons. Solid <-> Gas
// comparisons can always return that the Gas is greater without any commitment
// to any particular value being made.
#[derive(Clone, Debug)]
struct EvilValue {
    value: Cell<Value>,
}

impl EvilValue {
    pub fn new() -> Self {
        Self { value: Cell::new(Value::Gas) }
    }

    // Note that this doesn't violate any previous returned ordering,
    // as we create a new solid that is larger than any existing solids
    // which is consistent with any earlier Solid <-> Gas comparison we've made.
    pub fn solidify(&self) -> usize {
        if let Value::Solid(id) = self.value.get() {
            id
        } else {
            let id = NUM_SOLIDS.fetch_add(1, atomic::Ordering::Relaxed);
            assert!(id < usize::MAX);
            self.value.set(Value::Solid(id));
            id
        }
    }
}

impl PartialOrd for EvilValue {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        if self.value.get() == Value::Gas { other.solidify(); }
        self.value.partial_cmp(&other.value)
    }
}

impl PartialEq for EvilValue {
    fn eq(&self, other: &Self) -> bool {
        if self.value.get() == Value::Gas { other.solidify(); }
        self.value == other.value
    }
}

impl Eq for EvilValue { }
impl Ord for EvilValue {
    fn cmp(&self, other: &Self) -> Ordering {
        self.partial_cmp(other).unwrap()
    }
}

fn main() {
    for n in [10, 100, 1000, 10000, 100_000] {
        // Construct worst case.
        let mut values: Vec<_> = (0..n).map(|i| (EvilValue::new(), i)).collect();
        values.select_nth_unstable(n/2);
        let mut worst_case: Vec<usize> = vec![0; n];
        for (ev, i) in values {
            worst_case[i] = ev.solidify();
        }

        // Benchmark (ordinary usize comparisons - NOT EvilValue).
        let mut comparisons = 0;
        worst_case.select_nth_unstable_by(n/2, |a, b| {
            comparisons += 1;
            a.cmp(&b)
        });
        println!("{n}: {comparisons}");
    }
}

This is similar to my proof of concept showing the same issue for libc++'s nth_element: https://github.com/llvm/llvm-project/issues/52747.

Sp00ph commented 1 year ago

(Copy pasting this from #106933. Adding this to stdlib would guarantee O(n) worst case for selection. After #106997 it's already guaranteed O(n log n).)

I also now implemented the fast deterministic selection algorithm here. The implementation is entirely based on this paper (and this repo) except that I couldn't be bothered to implement expand_partition, and instead just used the existing stdlib partition. It completely blows heapsort out of the water. Even on very short slices it isn't much slower than heapsort, and on longer slices it becomes way faster. Random inputs of length 1 << 16 take ~4ms to sort using heapsort on my machine, whereas my fast deterministic selection implementation only takes ~70µs to select the median. Also, there is no noticable runtime degradation when choosing an index further from the middle of the slice as there was with median of medians. Note also that I paid absolutely no attention to optimization yet, so it may very well become even faster if we were to eliminate unnecessary bounds checks and such. The implementation also doesn't introduce that much new code (~200 lines, though with comments and optimizations that number will obviously grow).

In its current form it is a bit slower than select_nth_unstable on random input (though that might of course change with added optimizations), so I wouldn't (yet) consider always using it for selection, but for a fallback for introselect it looks very promising imo.