google / Xee

An Xarray extension for Google Earth Engine
Apache License 2.0
240 stars 28 forks source link

Inconsistent Dimensions When Loading Processed ImageCollection #139

Closed pt20 closed 6 months ago

pt20 commented 6 months ago

Description

I am experiencing an issue with the xee library when trying to load an ImageCollection from Google Earth Engine into an xarray dataset. Specifically, the dimensions of the dataset are not as expected when I specify the geometry and scale parameters. This behavior is inconsistent with the expected outcome described in the xee documentation and compared to similar operations on other collections like ECMWF/ERA5_LAND/HOURLY.

Steps to Reproduce

  1. Filter an ImageCollection using the COPERNICUS/S2_SR and COPERNICUS/S2_CLOUD_PROBABILITY collections for a specific area of interest (AOI) and date range.
  2. Apply cloud and shadow mask processing to the collection.
  3. Convert the processed ImageCollection to an xarray dataset without specifying geometry and scale, and observe the dimensions.
  4. Convert the same ImageCollection to an xarray dataset, this time specifying geometry and scale, and note the change in dimensions.

Here's a refactored version of the s2-cloudless notebook.

import ee
import xarray as xr

ee.Initialize(opt_url='https://earthengine-highvolume.googleapis.com', project='my-project')

def filter_collection(collection_id, start_date, end_date, cloud_filter=None):
    # Filters an Earth Engine ImageCollection based on date and optional cloud cover percentage.
    collection = ee.ImageCollection(collection_id).filterDate(start_date, end_date)
    if cloud_filter is not None and collection_id == 'COPERNICUS/S2_SR':
        collection = collection.filter(ee.Filter.lte('CLOUDY_PIXEL_PERCENTAGE', cloud_filter))
    return collection

def join_collections(primary_col, secondary_col):
    join_filter = ee.Filter.equals(leftField='system:index', rightField='system:index')
    join = ee.Join.saveFirst('s2cloudless')
    return ee.ImageCollection(join.apply(primary_col, secondary_col, join_filter))

def add_cloud_shadow_bands(img, thresholds, scale, projection_scale):
    cld_prb = ee.Image(img.get('s2cloudless')).select('probability')
    is_cloud = cld_prb.gt(thresholds['cloud_prob']).rename('clouds')
    img = img.addBands(ee.Image([cld_prb, is_cloud]))

    not_water = img.select('SCL').neq(6)
    dark_pixels = img.select('B8').lt(thresholds['nir_dark']*scale).multiply(not_water).rename('dark_pixels')
    shadow_azimuth = ee.Number(90).subtract(ee.Number(img.get('MEAN_SOLAR_AZIMUTH_ANGLE')))
    max_cloud_proj_dist = min(thresholds['cloud_proj_dist'] * 10, 511)

    cld_proj = img.select('clouds').directionalDistanceTransform(shadow_azimuth, max_cloud_proj_dist) \
        .reproject(crs=img.select(0).projection(), scale=projection_scale) \
        .select('distance').mask().rename('cloud_transform')
    shadows = cld_proj.multiply(dark_pixels).rename('shadows')

    return img.addBands(ee.Image([dark_pixels, cld_proj, shadows]))

def create_cloud_shadow_mask(img, buffer, thresholds):
    img = add_cloud_shadow_bands(img, thresholds, 1e4, 100)
    is_cld_shdw = img.select('clouds').add(img.select('shadows')).gt(0)
    is_cld_shdw = is_cld_shdw.focalMin(2).focalMax(buffer*2/20) \
        .reproject(crs=img.select([0]).projection(), scale=20) \
        .rename('cloudmask')
    return img.addBands(is_cld_shdw)

def apply_mask(img):
    not_cld_shdw = img.select('cloudmask').Not()
    return img.select('B.*').updateMask(not_cld_shdw)

def process_images(aoi, start_date, end_date, cloud_filter, buffer, thresholds):
    s2_sr_col = filter_collection('COPERNICUS/S2_SR', start_date, end_date, cloud_filter)
    s2_cloudless_col = filter_collection('COPERNICUS/S2_CLOUD_PROBABILITY', start_date, end_date)
    combined = join_collections(s2_sr_col, s2_cloudless_col).filterBounds(aoi)

    processed_images = combined.map(lambda img: create_cloud_shadow_mask(img, buffer, thresholds)) \
                               .map(apply_mask) \
                               .median()
    return processed_images.clip(aoi)

aoi = ee.Geometry.Polygon(
    [
        [
            [
              -101.8740622035503,
              33.57657453321113
            ],
            [
              -101.8740622035503,
              33.5312900243796
            ],
            [
              -101.80807878169567,
              33.5312900243796
            ],
            [
              -101.80807878169567,
              33.57657453321113
            ],
            [
              -101.8740622035503,
              33.57657453321113
            ]
          ]
    ]
)

START_DATE = '2020-01-01'
END_DATE = '2020-12-31'
CLOUD_FILTER = 20
BUFFER = 10
THRESHOLDS = {'cloud_prob': 30, 'nir_dark': 0.1, 'cloud_proj_dist': 1}

processed_ic = process_images(aoi, START_DATE, END_DATE, CLOUD_FILTER, BUFFER, THRESHOLDS)

# Doing following has following dimensions - time: 1, lon: 360, lat: 180
ds = xr.open_dataset(ee.ImageCollection(processed_ic), engine="ee")

# now if I add geometry and scale - the dimensions are - time: 1, lon: 1, lat: 1
ds2 = xr.open_dataset(ee.ImageCollection(processed_ic), engine="ee", geometry=aoi, scale=20)

Expected Behavior

The dimensions of the dataset (ds2) should accurately represent the specified AOI and scale, similar to the dimensions observed when performing a similar operation on the ECMWF/ERA5_LAND/HOURLY collection (as per readme).

Actual Behavior

The dimensions of the dataset ds is time: 1, lon: 360, lat: 180 (meaning whole globe) and ds2 significantly reduce to time: 1, lon: 1, lat: 1 (based on lat, lon values it seems like centroid of the AOI) upon specifying geometry and scale. This reduction does not align with the expected spatial resolution or the size of the AOI.

Additional Context

The AOI polygon is approximately 30sqkm in the mainland US. I followed the procedures provided in the s2-cloudless notebook and the guidance in the xee library's README. Is the observed behavior intended, or is there an oversight in setting the dimensions according to the AOI and scale correctly?

Many thanks in advance and let me know if I should provide additional info! 🙏🏽

dabhicusp commented 6 months ago

Hello, @pt20, the current workflow of Xee is as follows: If you are passing the geometry, the bounds are taken from the provided geometry itself and If you are not passing the geometry, the bounds are taken from the crs.

In the preceding example, you didn’t include geometry in ds, so it uses the bounds from the (CRS), which are (-180.0, -90.0, 180.0, 90.0). In contrast, in ds2 you included geometry, so it uses the bounds from the geometry, which are -101.87406 33.53129 -101.80808 33.57658.


is there an oversight in setting the dimensions according to the AOI and scale correctly?

  • Upon reviewing the dimensions of the AOI, it's important to note that the longitude coordinates range from -101.8740622035503 to -101.80807878169567, with a scale set to 20 degrees.
  • The scale of 20 degrees is utilized because, in the absence of a specified projection, the default CRS is 'EPSG:4326'. As a result, the units are interpreted as degrees.

Please let me know if you have any more questions, and I'll do my best to help!

dabhicusp commented 6 months ago

I am closing this issue if you still have doubt please feel free to reopen it.

pt20 commented 6 months ago

Hi @dabhicusp ,

Thank you for your response and apologies for the delay, somehow I didn't get notification of your reply.

I understand now how the scale factor works. I will try with scale factor 0.00018 which would roughly translate to 20 meters.