google / Xee

An Xarray extension for Google Earth Engine
Apache License 2.0
240 stars 28 forks source link

Opening a MODIS dataset brings seemingly random values across, rest are 0 #122

Open scheip opened 8 months ago

scheip commented 8 months ago

Hi folks, cannot explain my appreciation and excitement for xee, what a great package. I have used this successfully with other datasets (e.g., ERA-5) but am now trying to work with MODIS data. I am experiencing a problem where every time I use xr.open_dataset... I get a unique result. Code example:

# specify study area
ur = [35.89, -81.87] # y, x
ll = [35.83, -81.945] # y, x   # full region
coords = [ll[1], ll[0], ll[1], ur[0], ur[1], ur[0], ur[1], ll[0]]
geom = ee.Geometry.Polygon(coords)

# Specify image collections
ic = ee.ImageCollection("MODIS/061/MCD15A3H").filterDate('2016-06-01', '2024-01-01').filterBounds(geom).select('Lai')

ds = xr.open_dataset(
    ic,
    engine='ee',
    crs='EPSG:32617',
    scale=500,
    geometry=geom
)

# plot lai, average across x and y
ds['Lai'].mean(dim=['X', 'Y']).plot()

If I run this code 5 times, I get 5 different time-series plots. In every case, a substantial portion of the data is 0. Here are 3 example outputs of running the above code. Why would it be different every time?

I am currently on 0.0.5 of xee: xee 0.0.5 pyhd8ed1ab_0 conda-forge Thank you for any pointers.

run1 run2 run3

scheip commented 8 months ago

I can confirm I do not see this behavior with another EE dataset. For example, the only change in the below is switching over to an ERA5 dataset:

# specify study area
ur = [35.89, -81.87] # y, x
ll = [35.83, -81.945] # y, x   # full region
coords = [ll[1], ll[0], ll[1], ur[0], ur[1], ur[0], ur[1], ll[0]]
geom = ee.Geometry.Polygon(coords)

# Specify image collections
ic = ee.ImageCollection("ECMWF/ERA5_LAND/MONTHLY_AGGR").filterDate('2016-06-01', '2024-01-01').filterBounds(geom).select('temperature_2m')

ds = xr.open_dataset(
    ic,
    engine='ee',
    crs='EPSG:32617', # utm zone 17n
    scale=500,
    geometry=geom
)

# plot lai, average across x and y
ds['temperature_2m'].mean(dim=['X', 'Y']).plot()

And this yields the same result every time, as expected.

noahgolmant commented 8 months ago

Is there a mismatch between the bounds CRS and the requested CRS of the dataset? I am not exactly sure why the issue is non-deterministic as a result. I ran this on my own test area, Pará, Brazil, and observed similar zero values until I adjusted this. When I assume EPSG:4326 for your bounds, I think the boundary is in Antarctica, so the LAI is 0 as expected.

With the original code:

import ee
import geopandas as gpd
from shapely.geometry import mapping
import xarray as xr
import xee
from ingest.consts import SERVICE_PATH, EMAIL

credentials = ee.ServiceAccountCredentials(EMAIL, SERVICE_PATH)
ee.Initialize(credentials)

bounds = gpd.read_file("/home/noah/ingest/configs/data/acre_tiny_test_site.geojson").iloc[0].geometry
geom = ee.Geometry(mapping(bounds))

# Specify image collections
ic = ee.ImageCollection("MODIS/061/MCD15A3H").filterDate('2016-06-01', '2024-01-01').filterBounds(geom).select('Lai')

ds = xr.open_dataset(
    ic,
    engine='ee',
    crs='EPSG:32617',
    scale=500,
    geometry=geom
)

# plot lai, average across x and y
ds['Lai'].mean(dim=['X', 'Y']).plot()
image

With the following variant, setting CRS to 4326, we see no zeros and expected seasonal variation:

# Specify image collections
ic = ee.ImageCollection("MODIS/061/MCD15A3H").filterDate('2016-06-01', '2024-01-01').filterBounds(geom).select('Lai')

ds = xr.open_dataset(
    ic,
    engine='ee',
    crs='EPSG:4326',
    geometry=geom
)

# plot lai, average across x and y
ds['Lai'].mean(dim=['lon', 'lat']).plot()
image
scheip commented 7 months ago

Thank you @noahgolmant for trying this out! I don't know what happened, perhaps an update on GEE's side, but the code I listed in my first message now produces expected results, with no zeros and no code changes. Very strange!

I'd be curious if you re-ran your first code block if you also now get expected results?

scheip commented 7 months ago

Strange - running the same troubleshooting code pasted above again is resulting in spurious/missing values. Any further thoughts on this?

Copying the troubleshooting code here:

# specify study area
ur = [35.89, -81.87] # y, x
ll = [35.83, -81.945] # y, x   # full region
coords = [ll[1], ll[0], ll[1], ur[0], ur[1], ur[0], ur[1], ll[0]]
geom = ee.Geometry.Polygon(coords)

# Specify image collections
ic = ee.ImageCollection("MODIS/061/MCD15A3H").filterDate('2016-06-01', '2024-01-01').filterBounds(geom).select('Lai')

ds = xr.open_dataset(
    ic,
    engine='ee',
    crs='EPSG:32617',
    scale=500,
    geometry=geom
)

# plot lai, average across x and y
ds['Lai'].mean(dim=['X', 'Y']).plot()
naschmitz commented 7 months ago

Sounds like this is related to #119. We're looking into it.