pangeo-data / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
189 stars 34 forks source link

Dataset can't be DataArray #227

Closed kthyng closed 1 year ago

kthyng commented 1 year ago

Hi! Thanks for this package, I use it regularly.

I apparently have accidentally had testing for my functions that use xESMF turned off for awhile but I think this used to work so I wanted to note it.

When I run a Regridder command with an input DataArray instead of Dataset, I get an error like this:

    regridder = xe.Regridder(
../../miniconda3/envs/extract_model/lib/python3.9/site-packages/xesmf/frontend.py:773: in __init__
    grid_in, shape_in, input_dims = ds_to_ESMFgrid(
../../miniconda3/envs/extract_model/lib/python3.9/site-packages/xesmf/frontend.py:116: in ds_to_ESMFgrid
    lon, lat = _get_lon_lat(ds)
../../miniconda3/envs/extract_model/lib/python3.9/site-packages/xesmf/frontend.py:45: in _get_lon_lat
    if ('lat' in ds and 'lon' in ds) or ('lat' in ds.coords and 'lon' in ds.coords):

...

E           ValueError: The truth value of a Array is ambiguous. Use a.any() or a.all().

But I experimented and if I pause in the code there and convert "ds" to a Dataset, then it seems to run ok.

Should the xarray objects be Datasets for other reasons in the code, in which case a TypeError should probably be input if ds_in is a DataArray, or should that boolean statement be modified in case a DataArray is input?

Thanks!

raphaeldussin commented 1 year ago

Hi @kthyng !

We should be able to work on a dataarray. Do you have a minimal example reproducing the problem?

kthyng commented 1 year ago

@raphaeldussin Ok I think I figured out what the issue actually is. It seems to be a problem with Dask Arrays in particular, so my question changed from DataArrays to if xESMF should work with DaskArrays. I didn't actually realize I was using a DaskArray at the time!

My example is a bit wordy, but it is simple.

VERSION 1: DataArrays: line 45 in frontend.py returns a warning:

import xarray as xr
import xesmf as xe
import pandas as pd
import numpy as np
import dask.array as daska

lats, lons = np.array([42.3]), np.array([-99.5])

attrs_lat = dict(units="degrees_north",standard_name="latitude",)
attrs_lon = dict(units="degrees_east",standard_name="longitude",)
locstream=True

np.random.seed(0)
temperature = 15 + 8 * np.random.randn(2, 2, 3)
lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]
time = pd.date_range("2014-09-06", periods=3)
reference_time = pd.Timestamp("2014-09-05")
da = xr.DataArray(
    data=temperature,
    dims=["x", "y", "time"],
    coords=dict(
        lons=(["x", "y"], lon, attrs_lon),
        lats=(["x", "y"], lat, attrs_lat),
        time=time,
        reference_time=reference_time,
    ),
    attrs=dict(
        description="Ambient temperature.",
        units="degC",
    ),
)

da_out = xr.DataArray(dims=["loc"],
    coords=
    {
        "lat": (["loc"], lats, dict(units="degrees_north", standard_name="latitude",),),
        "lon": (["loc"], lons, dict(units="degrees_east", standard_name="longitude",),),
    }
)

regridder = xe.Regridder(da, da_out, "bilinear", locstream_out=locstream,)

stopped at

/Users/kthyng/miniconda3/envs/extract_model/lib/python3.9/site-packages/xesmf/frontend.py(46)_get_lon_lat()
     44     """Return lon and lat extracted from ds."""
     45
---> 46     if ('lat' in ds and 'lon' in ds) or ('lat' in ds.coords and 'lon' in ds.coords):

returns

/Users/kthyng/miniconda3/envs/extract_model/lib/python3.9/site-packages/xarray/core/dataarray.py:857: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  return key in self.data
False

VERSION 2: with da as a Dask Array: line 45 in frontend.py returns an error:

import xarray as xr
import xesmf as xe
import pandas as pd
import numpy as np
import dask.array as daska

lats, lons = np.array([42.3]), np.array([-99.5])

attrs_lat = dict(units="degrees_north",standard_name="latitude",)
attrs_lon = dict(units="degrees_east",standard_name="longitude",)
locstream=True

np.random.seed(0)
temperature = 15 + 8 * np.random.randn(2, 2, 3)
lon = [[-99.83, -99.32], [-99.79, -99.23]]
lat = [[42.25, 42.21], [42.63, 42.59]]
time = pd.date_range("2014-09-06", periods=3)
reference_time = pd.Timestamp("2014-09-05")
da = xr.DataArray(
    data=temperature,
    dims=["x", "y", "time"],
    coords=dict(
        lons=(["x", "y"], lon, attrs_lon),
        lats=(["x", "y"], lat, attrs_lat),
        time=time,
        reference_time=reference_time,
    ),
    attrs=dict(
        description="Ambient temperature.",
        units="degC",
    ),
)
da = daska.from_array(da)

da_out = xr.DataArray(dims=["loc"],
    coords=
    {
        "lat": (["loc"], lats, dict(units="degrees_north", standard_name="latitude",),),
        "lon": (["loc"], lons, dict(units="degrees_east", standard_name="longitude",),),
    }
)

regridder = xe.Regridder(da, da_out, "bilinear", locstream_out=locstream,)

stopped at

/Users/kthyng/miniconda3/envs/extract_model/lib/python3.9/site-packages/xesmf/frontend.py(46)_get_lon_lat()
     44     """Return lon and lat extracted from ds."""
     45
---> 46     if ('lat' in ds and 'lon' in ds) or ('lat' in ds.coords and 'lon' in ds.coords):

returns for that conditional statement

/Users/kthyng/miniconda3/envs/extract_model/lib/python3.9/site-packages/dask/array/core.py:464: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  o = func(*args, **kwargs)
/Users/kthyng/miniconda3/envs/extract_model/lib/python3.9/site-packages/dask/array/utils.py:138: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  meta = func(*args_meta, **kwargs_meta)
*** ValueError: The truth value of a Array is ambiguous. Use a.any() or a.all().
aulemahal commented 1 year ago

The fact that the line doesn't fail with non-dask array hides the bug, because it is not doing what this line intends to. Rather than checking if the 'lon' variable exists in the dataset, it checks if the 'lon' value is in the array, which is no use.

Indeed , I think this is a bug as ds_to_ESMFGrid (and subsequent functions) were not meant to accept DataArrays. However, a to_dataset call could be done in the regridder when a DataArray is given as input so the user doesn't have to worry about this.

kthyng commented 1 year ago

thank you!