pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.08k forks source link

{DataArray,Dataset} accessors with parameters #3829

Open seth-p opened 4 years ago

seth-p commented 4 years ago

I would like to be able to create an DataArray accessor that takes parameters, e.g. obj.weighted(w).sum(dim). This appears to be impossible using the existing @register_{dataarray,dataset}_accessor, which supports only accessors of the form obj.weighted.sum(w, dim).

To support the desired syntax, one could simply change https://github.com/pydata/xarray/blob/master/xarray/core/extensions.py#L36 from

            accessor_obj = self._accessor(obj)

to

            accessor_obj = partial(self._accessor, obj)

But that would break the current syntax (i.e. would require obj.accessor().foo(), so is clearly not acceptable.

So any suggestions (short of simply creating slightly modified copies of register_{dataarray,dataset}_accessor) for supporting both the existing obj.accessor.foo() syntax as well as my desired obj.accessor(*args, **kwargs).foo() syntax?

keewis commented 4 years ago

not sure if you really should use accessors to do that; having a function return a wrapper object might be enough.

Then again, an accessor is a slightly modified version of a property so you can just make your object callable by defining __call__:

@xr.register_dataarray_accessor("weighted")
class Weighted:
    def __init__(self, xarray_obj):
        self._obj = xarray_obj
        self._weight = None

    def __call__(self, weight):
        self._weight = weight

    def sum(self, dim):
        return "weighted sum"

this does allow calling obj.weighted.sum(dim), so instead you can use:

In [2]: class Weighted:
   ...:     def __init__(self, obj, weight): 
   ...:         self._obj = obj 
   ...:         self._weight = weight 
   ...:  
   ...:     def sum(self, dim): 
   ...:         return f"weighted sum over {dim} and with weight {self._weight}" 
   ...:  
   ...: @xr.register_dataarray_accessor("weighted") 
   ...: def weighted(obj): 
   ...:     def wrapped(weight): 
   ...:         return Weighted(obj, weight) 
   ...:     return wrapped 
   ...:  
   ...: da = xr.DataArray(data=range(5), dims="x") 
   ...: da.weighted(5).sum(dim="x")
Out[2]: 'weighted sum over x and with weight 5'

Edit: the warning about a overridden attribute was the result of rerunning the code without restarting the interpreter

seth-p commented 4 years ago

@keewis, thanks for the suggestions. Both seem reasonable.

In your first example, if you wanted to prohibit obj.weighted.sum(dim), you could just check for self._weight in sum(). Though I suppose it would be nice to be able to have the interpreter enforce the requirement and not have to do an explicit check in every method.

jhamman commented 4 years ago

Would it be worth adding @keewis' examples to the accessor documentation?

keewis commented 4 years ago

:+1:

We might want to discourage using this to add methods to the DataArray / Dataset namespace, though.