pangeo-data / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
194 stars 35 forks source link

Regridder hangs on input DataArray with length 0 dimension #244

Open kevinrosa opened 1 year ago

kevinrosa commented 1 year ago

I recently discovered that xesmf.Regridder gets stuck in some sort of infinite process without returning an error if I pass in a DataSet with an empty DataArray (time dimension of length 0 this my case).

My personal fix has been to add a check of the time dimension before regridding, but I thought I'd share this experience in case this is not the intended behavior.

The source dataset:

>>> ds_in
<xarray.Dataset>
Dimensions:     (gridY: 898, gridX: 398, time: 0)
Coordinates:
  * gridY       (gridY) int32 0 1 2 3 4 5 6 7 ... 891 892 893 894 895 896 897
  * gridX       (gridX) int16 0 1 2 3 4 5 6 7 ... 391 392 393 394 395 396 397
    lat         (gridY, gridX) float64 46.86 46.86 46.86 ... 51.1 51.1 51.1
    lon         (gridY, gridX) float64 -123.4 -123.4 -123.4 ... -124.3 -124.3
  * time        (time) datetime64[ns] 
    depth       float32 0.5
Data variables:
    bathymetry  (gridY, gridX) float64 ...
    u           (time, gridY, gridX) float32 ...
    v           (time, gridY, gridX) float32 ...

>>> ds_in.u.values
array([], shape=(0, 898, 398), dtype=float32)

The destination grid:

>>> ds_reg
<xarray.Dataset>
Dimensions:  (lon: 631, lat: 634)
Coordinates:
  * lon      (lon) float64 -125.3 -125.3 -125.3 -125.3 ... -122.2 -122.1 -122.1
  * lat      (lat) float64 47.27 47.28 47.28 47.29 ... 50.42 50.43 50.43 50.44
Data variables:
    *empty*

The regridding step:

regridder = xe.Regridder(ds_in, ds_reg, "bilinear")

# Regrid the velocities
ds_out = regridder(ds_in)  #NOTE: this is where the code gets stuck
huard commented 1 year ago

Thanks for the report, do you want to submit a PR to fix this ?

kevinrosa commented 1 year ago

@huard I should have clarified, my quick "fix" was just to add an if statement to my script, I haven't actually attempted to implement anything in the xESMF code. I could try to poke around at some point but I haven't spend any time in the codebase.

kevinrosa commented 1 year ago

I just ran into this same sneaky bug on a different dataset. Was searching to see if anyone else had reported it and found my own comment.

I've looked through the xESMF code a bit and still don't feel confident attempting a PR but I may have a hint in case someone else wants to try. I'd also be willing to provide a minimal working example.

Here is my xESMF Regridder object:

xESMF Regridder 
Regridding algorithm:       nearest_s2d 
Weight filename:            nearest_s2d_1x145141_141x222.nc 
Reuse pre-computed weights? False 
Input grid shape:           (1, 145141) 
Output grid shape:          (141, 222) 
Periodic in longitude?      False

It is expecting an input grid of shape (1, 72670).

But then if we inspect my input data:

>>> ds['u']
<xarray.DataArray 'u' (time: 0, nele: 145141)>
array([], shape=(0, 145141), dtype=float32)
Coordinates:
  * time     (time) datetime64[ns] 
Dimensions without coordinates: nele
Attributes:
    units:          meters second-1
    standard_name:  eastward_sea_water_velocity

Shape is (0, 145141). So maybe this mismatch is leading to the regridder step getting hung up?

aulemahal commented 1 year ago

Hi @kevinrosa.

I sadly wasn't able to reproduce the issue. Here's my code generating a "locstream" dataset as input, with a time dimension of size 0.

import xarray as xr
import numpy as np
import xesmf as xe

ds_in = xr.Dataset(data_vars={'data': (('time', 'site'), np.zeros((0, 100)))})
ds_in = ds_in.assign_coords(lat=(('site',), np.linspace(-80, 80, 100)), lon=(('site',), np.linspace(-180, 180, 100)))

lon_new = np.arange(-60, -47.9, 0.1)
lat_new = np.arange(-36, -24.9, 0.1)

reg = xe.Regridder(ds_in, {'lat': lat_new, 'lon': lon_new}, locstream_in=True, method='nearest_s2d')
reg(ds_in)

What version of xESMF are you using ? We made major changes in the latest releases, maybe this was fixed in 0.8.2 ?