google / Xee

An Xarray extension for Google Earth Engine
Apache License 2.0
243 stars 28 forks source link

Update _slice_collection to use toBands and select #91

Open boothmanrylan opened 10 months ago

boothmanrylan commented 10 months ago

Avoids the expensive call to toList unless the ImageCollection contains more than 5000 images and we are slicing past the 5000th image.

Fixes https://github.com/google/Xee/issues/88 where xee wasn't recognizing new bands added to existing images or renamed bands.

google-cla[bot] commented 10 months ago

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

alxmrs commented 10 months ago

Thanks for the explanation! That makes sense to me. I see how this works around the limits rather than the specifics.

On Sat, Nov 4, 2023 at 3:31 AM Rylan Boothman @.***> wrote:

@.**** commented on this pull request.

In xee/ext.py https://github.com/google/Xee/pull/91#discussion_r1381954762:

@@ -679,31 +679,19 @@ def _slice_collection(self, image_slice: slice) -> ee.Image:

Get the right range of Images in the collection, either a single image or

a range of images...

start, stop, stride = image_slice.indices(self.shape[0])

  • If the input images have IDs, just slice them. Otherwise, we need to do

  • an expensive toList() operation.

  • if self.store.image_ids:
  • imgs = self.store.image_ids[start:stop:stride]
  • selectors = list(range(start, stop, stride))
  • col = self.store.image_collection.select(self.variable_name)
  • if self.shape[0] <= 5000: # 5000 == max bands in an Image
  • col_as_image = col.toBands()
  • return col_as_image.select(selectors)
  • elif stop < 5000: # 5000 == max bands in an Image

I don't think so. Even if the start/stride arguments would make the final image have fewer than 5000 bands, I'm avoiding toList by converting to a multiband image before slicing so I need to grab the first "stop" bands and toBands will fail if it would return an image with more than 5000 bands.

— Reply to this email directly, view it on GitHub https://github.com/google/Xee/pull/91#discussion_r1381954762, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARXAB77ASVIM3FZUJRBHQLYCUL6BAVCNFSM6AAAAAA627ORW6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTOMJTGEYTMNRZGY . You are receiving this because you commented.Message ID: @.***>

boothmanrylan commented 10 months ago

Discussion with Noel Gorelick on Earth Engine Developer forums about a more efficient way to slice an image collection: https://groups.google.com/g/google-earth-engine-developers/c/fSAo9Jn615U

The recommendation was to get a list of system:index, slice that list, then filter the image collection for the remaining system:index

alxmrs commented 10 months ago

What is the difference between a system:id and a system:index? We do something similar to the suggestion with ids.

I have a hunch that the recommended approach won’t work for all cases, and thus we need a toList() or toBand() alternative implementation.

On Tue, Nov 7, 2023 at 2:00 AM Rylan Boothman @.***> wrote:

Discussion with Noel Gorelick on Earth Engine Developer forums about a more efficient way to slice an image collection: https://groups.google.com/g/google-earth-engine-developers/c/fSAo9Jn615U

The recommendation was to get a list of system:index, slice that list, then filter the image collection for the remaining system:index

— Reply to this email directly, view it on GitHub https://github.com/google/Xee/pull/91#issuecomment-1795029027, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARXAB5N2KOGKFYXGGEUG5TYDD3R7AVCNFSM6AAAAAA627ORW6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJVGAZDSMBSG4 . You are receiving this because you commented.Message ID: @.***>

boothmanrylan commented 10 months ago

We likely need someone more familiar with earth engine's internals than I am to confirm this, but I believe the main differences between system:id and system:index are:

I think the issue with how system:id was originally being used comes from here:

 col = ee.ImageCollection(imgs)

Because this reloads the raw images from the data catalogue and therefore throws out any modifications that a user made to the collection beforehand.

I'm not sure that its possible to have an image collection without the images all having a unique system:index, but I will add the alternative methods back in as a fail safe because I'm not 100% certain of that.

Relevant gis stackexchange answer: https://gis.stackexchange.com/a/390307 Earth engine script where I was messing around with system:id and system:index: https://code.earthengine.google.com/8eb85fdb77aeb1adca98fe5eaa36246c

naschmitz commented 10 months ago

@boothmanrylan Could you sync and resolve the conflicts? Then, I can approve and run the integration tests against your PR.

boothmanrylan commented 9 months ago

@naschmitz I resolved the conflicts.

The tests are passing for me locally, but failing here on the "Authenticate to Google Cloud" step which I'm not sure how to resolve. If there is something I need to change to get that to work please let me know.