pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

Add `nunique` #9548

Open dcherian opened 2 months ago

dcherian commented 2 months ago

Is your feature request related to a problem?

From https://github.com/pydata/xarray/issues/9544#issuecomment-2372685411

Though perhaps we should add nunique along a dimension implemented as sort along axis, succeeding-elements-are-not-equal along axis handling NaNs, then sum along axis.

xref pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.nunique.html

I think I'd add it to https://github.com/pydata/xarray/blob/main/xarray/util/generate_aggregations.py

snitish commented 1 week ago

Adding this method to aggregations would mean that it would need to support reducing along multiple axes. I'm not sure how straightforward it is to sort an ndarray along multiple dimensions. We could collapse the axes into one and then sort and count. Any pointers on how that can be done? Alternatively, we could just support one dimension (or none), but then we wouldn't be able to add it to aggregations. At least that's my understanding.