roocs / clisops

Climate Simulation Operations
https://clisops.readthedocs.io/en/latest/
Other
21 stars 9 forks source link

What about an xarray extension? #37

Open aulemahal opened 4 years ago

aulemahal commented 4 years ago

Packages like rioxarray or hvplot, provide an xarray extension so their methods can be called directly on the dataset. Would that be wanted with clisops? Example: instead of

from clisops import subset
subset.subset_bbox(ds, lat_bnds=[45, 50], lon_bnds=[-60, -55])

one could use:

import clisops.xarray
ds.cso.subset_bbox(lat_bnds=[45, 50], lon_bnds=[-60, -55])

Where "cso" is the xarray extension added by clisops.

Personally, I like this approach as it looks more elegant and xarray-esque. Moreover, it could allow for dataset-related lookups like crs info in metadata or using something like rioxarray's ds.rio.set_spatial_dims to solve the problem of #32. Implementation-wise, it shouldn't be complicated and wouldn't change the rest of the api, simply add another access mechanism. And, I believe it would make clisops more attractive to xarray users!

As a heavy user of almost-extinct xclim.subset, I can offer some time on this implementation, it it is wanted.

agstephens commented 4 years ago

Hi @aulemahal, that sounds like a really nice approach. I didn't realise xarray had a formal way of doing this. I will discuss with colleagues and get back to you. Thanks

aulemahal commented 4 years ago

For reference : http://xarray.pydata.org/en/stable/internals.html#extending-xarray

Zeitsperre commented 4 years ago

@agstephens It would be great to address this issue in our upcoming meeting. I'd love to see this as an optional way of calling clisops, maybe with a few goodies enabled?

agstephens commented 4 years ago

Yes @Zeitsperre, let's talk about this. It seems straightforward.

agstephens commented 4 years ago

This would, indeed, be easy to create:

import clisops
import clisops.core.subset

import xarray as xr
import os

@xr.register_dataset_accessor("cso")
class ClisopsCoreWrapper(object):

    def __init__(self, xarray_obj):
        self._obj = xarray_obj

    @property
    def version(self):
        return clisops.__version__

    def subset_time(self, *args, **kwargs):
        return clisops.core.subset.subset_time(self._obj, *args, **kwargs)

def test_cso():

    dr = '/badc/cmip6/data/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp126/r1i1p1f1/Amon/tas/gn/v20190710'
    fpath = os.path.join(dr, os.listdir(dr)[0])
    ds = xr.open_dataset(fpath)

    print(ds.cso.version)
    print(ds.cso.subset_time(start_date='2016-01-16', end_date='2016-12-16'))

test_cso()
agstephens commented 4 years ago

This could be automatically maintained by doing picking up the external functions list from relevant modules and creating lambdas for each:

import xarray as xr

@xr.register_dataset_accessor("cso")
class ClisopsCoreWrapper(object):

    def __init__(self, xarray_obj):
        self._obj = xarray_obj

        for funcname in clisops.core.subset.__all__:
            func = getattr(clisops.core.subset, funcname)
            setattr(self, funcname, (lambda *args, **kwargs: func(self._obj, *args, **kwargs)))

    @property
    def version(self):
        return clisops.__version__

So it looks very easy to do. The question is: should this be the public API that we expose?

Any thoughts: @huard @Zeitsperre @cehbrecht @ellesmith88 ?

huard commented 4 years ago

I don't have a strong opinion either way. I certainly think it's worth experimenting with.

Zeitsperre commented 3 years ago

I realize that this is still in the planning stage, but it looks like rioxarray is slated to become a back-end engine for xarray. Something to keep an eye on: https://github.com/pydata/xarray/issues/4697