pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.35k stars 1.86k forks source link

rolling_min/max has quadratic worst case behavior #12714

Open orlp opened 9 months ago

orlp commented 9 months ago

The current algorithm implemented by @magarick in https://github.com/pola-rs/polars/pull/9277#issuecomment-1581401692 is quadratic in the worst case. Reproduction:

bad = [i ^ 1 for i in range(10**6)]
df = pl.DataFrame({"x": bad})
df.select(pl.col("x").rolling_min(10**5))

I suggest we implement the algorithm described by me here: https://cs.stackexchange.com/questions/120915/interview-question-with-arrays-and-consecutive-subintervals/120936#120936, which is single-pass, O(n) time and O(k) memory for an input of length n with a window of size k.

corvusrabus commented 5 days ago

Is this still of interest? I see the entire surrounding code was moved into a folder called legacy