pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.08k forks source link

Add rolling.rank() same as pandas #8677

Open Mirac-Le opened 9 months ago

Mirac-Le commented 9 months ago

Is your feature request related to a problem?

Dear xarray maintainers,

I would like to express my heartfelt gratitude for the significant optimizations your xarray library has brought to my project. Xarray combines the speed of numpy with the highly customizable parameters of pandas. The extensive parameters in the rolling module have allowed me to achieve functionality similar to pandas more efficiently.

I am wondering if it would be possible to incorporate a ranking method for rolling windows, including the ability to specify parameters such as pct, similar to the pandas rolling.rank function. Your consideration of this feature would be greatly appreciated.

Once again, thank you for your contributions! rolling

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

welcome[bot] commented 9 months ago

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

Mirac-Le commented 9 months ago

It would be even better if skewness and kurtosis functions could be added. Thanks a lot!

max-sixty commented 9 months ago

Someone would need to look more, but possibly this is hooking up bottleneck's move_rank in the same way bottleneck's other move_ functions.

Mirac-Le commented 9 months ago

Hi, max-sixty. Thanks for your reply! I have conducted tests on pandas, xarray, and bottleneck using a randomly generated 5000*5000 ndarray containing NaN values. I found that xarray and bottleneck exhibit similar speeds, with xarray being slightly faster (based on tests involving rolling sum, mean, etc. Their results are the same. Bottleneck's move_rank does not provide a pct parameter). I noticed a statement in the xarray documentation: "rolling window aggregations are faster and use less memory when bottleneck is installed". If it's possible to perform rolling rank, skew, kurt, and other related calculations using xarray's native functions, that would be fantastic. I've become reliant on xarray since I first learned about it and use it, and it has provided significant convenience for my project. It is fast, and the parameters are similar to those in pandas. Thanks to all the maintainers for providing us with such a great library. I'm looking forward to the emergence of new features and your reply. Thanks again for taking the time! :)