google / Xee

An Xarray extension for Google Earth Engine
https://xee.rtfd.io
Apache License 2.0
246 stars 29 forks source link

Xee does not provide correct data for resampled image #145

Open deepgabani8 opened 7 months ago

deepgabani8 commented 7 months ago

When I use ee.data.computePixels, it returns different data for different resample methods, which it should but fetching data using xee as backend returns the same data.

Using ee.data.computePixels, which gives different data for resample methods = ['bilinear', 'bicubic'].

image = ee.Image(image_id)
image = image.resample('bilinear').reproject(crs=crs, crsTransform=crs_transform)
image = image.clip(
    ee.Geometry.Rectangle(
      [[-180, -90], [-90, -45]],
      None,
      geodesic=False,
  )
)

data = np.load(io.BytesIO(ee.data.computePixels({
    'expression': image,
    'fileFormat': 'NPY',
})))

Using Xee as the backend, which gives the same data for resample methods = ['bilinear', 'bicubic'].

image = ee.Image(image_id)
image = image.resample('bilinear').reproject(crs=crs, crsTransform=crs_transform)
geom = ee.Geometry.Rectangle(
    [[-180, -90], [-90, -45]],
    None,
    geodesic=False,
)

ds = xr.open_dataset(
  ee.ImageCollection([image]), projection = image.projection(), geometry=geom,
  engine=xee.EarthEngineBackendEntrypoint,
)

var = list(ds.data_vars)[0]
data = ds[var].data[0,:,:].T
mahrsee1997 commented 7 months ago

Thanks for raising the issue, @deepgabani8.

Just for your information (jfyi), we internally work with asset_ids, so any manipulation made on images like these will not affect the data fetched from EE.

mahrsee1997 commented 7 months ago

Actually, we have a bug/issue in the codebase: https://github.com/google/Xee/blob/main/xee/ext.py#L789C1-L797C80.

We consider 'asset_ids' at the time of processing, if we can obtain them. And we implemented this to avoid an expensive toList() operation. But it seems like it will cause issue like in your case.

alxmrs commented 3 months ago

This discussion on the Pangeo discourse seems really relevant.

https://discourse.pangeo.io/t/example-which-highlights-the-limitations-of-netcdf-style-coordinates-for-large-geospatial-rasters/4140

@schwehr @simon Any thoughts on how this might be related? Could the discrepancy eventually cause an issue?