vaexio / vaex

Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per second 🚀
https://vaex.io
MIT License
8.22k stars 589 forks source link

[FEATURE-REQUEST] looking for vaex equivalent of pandas_df.corr(min_periods=100) #2402

Open Rajendra-mckesson opened 7 months ago

Rajendra-mckesson commented 7 months ago

Description Vaex correlation calculation completely ignores the rows or column with missing values; which is not desired behavior with real world data; Pandas have such functionality pandas_df.corr(min_periods=100), where min_periods defines minimum number of valid (Not NaN) observations required per pair of columns to have a valid correlation value.

Is your feature request related to a problem? Please describe.

Additional context pandas equivalent: pandas_df.corr(min_periods=100)