pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.17k stars 1.94k forks source link

Add `min_periods` to `polars.corr` #15458

Open fkemeth opened 7 months ago

fkemeth commented 7 months ago

Description

Add optional integer parameter min_periods to polars.corr, i.e. add option to require for Pearson/Spearman correlation coefficient to return non-null values only when enough rows have non-null values in common.

See also the example here.

fkemeth commented 7 months ago

It is implemented in polars.rolling_corr, but, as I understand it, rolling_corr does have different output than polars.corr.

The feature of min_periods in polars.corr is usefull for collaborative filtering/recommender system applications, for example.