Open xhochy opened 6 years ago
Quick comment: for the argsort case, I think this could be solved by changing np.argsort(values, ..)
to values.argsort(..)
in the Series.argsort
implementation? (if this is blocking you, fix certainly welcome)
But indeed, we should discuss this more in general.
In the abstract, I'm also interested in this. The set of methods that are dispatched to currently is pretty ad-hoc (essentially enough to get df.groupby('extension_array').mean()
working :)
2 thoughts here
1) Since the OP we've added many private EA methods that we dispatch to under the hood (EA._where, EA._putmask, EA._quantile). We could address many of these cases by leaning heavily on that pattern.
2) Implementing something like __pandas_ufunc__
or __pandas_priority__
might be helpful for eg #38946
I don't think there's any appetite for adding an __array_ufunc__
-like mechanism, but we are definitely moving in the direction of more methods being defined on the EAs and being directly delegated to.
During the implementation of non-numpy backed ExtensionArrays I quite often run into the case where it is simpler for me to write a complete re-implementation of the method defined on
pd.Series
instead of using the current implementation that only delegates part of the work. It would probably make sense to introduce some sort of delegation mechanism, either we continue the delegation like in https://github.com/pandas-dev/pandas/blob/4274b840e64374a39a0285c2174968588753ec35/pandas/core/base.py#L1041 or we could possibly add really general interface like NumPy's__array_ufunc__
: https://docs.scipy.org/doc/numpy/reference/arrays.classes.html#numpy.class.__array_ufunc__My use case where this arises currently is coming from https://github.com/pandas-dev/pandas/issues/21296 and
pd.Series.argsort
but I expect that there will be much more cases in this direction while I continue to implement the ExtensionArray interface for Arrow Arrays.