google / Xee

An Xarray extension for Google Earth Engine
Apache License 2.0
240 stars 28 forks source link

feat: support bbox in in geometry #138

Closed ljstrnadiii closed 5 months ago

ljstrnadiii commented 6 months ago

This pr adds support for matching an existing xarray dataset that adheres to rioxarray standards and to support float64 precision with coordinates.

Motivating Use Case of match_xarray:

Motivating Use Case of use_coords_double_precision:

🦩

ljstrnadiii commented 6 months ago

Quick screen shot of how coords are changing before calling .getInfo(), after, and the inputs that come from raster.io.bounds(). I might be missing something here, but they are off just slightly. At least enough to for xr.merge to notice. Actually, I am seeing that if we call .bounds() on a ee.Geometry.Rectangle, we see this discrepancy. I would love to use np.testing.assert_equal instead of the close test if we can think of a way to prevent this or maybe this is an issue worth filing with ee folks.

Screenshot 2024-02-12 at 9 21 25 PM
ljstrnadiii commented 6 months ago

@dabhicusp @naschmitz @alxmrs (+others; not sure who to cc) would you all be receptive to something like this? Any thoughts?

ljstrnadiii commented 6 months ago

I removed changes around ee credentials/auth in case that was the issue with the previous workflow run failure.

ljstrnadiii commented 6 months ago

I am not sure what's up with failing ci workflows. Looks like it may be a secret issue.

jdbcode commented 6 months ago

Thanks for the PR, Leonard! You can mostly ignore the ci workflows, they'll be run in a different environment. Note that currently we're on a biweekly issue and PR triage cycle. Thanks for your patience as we get this assigned for review!

ljstrnadiii commented 6 months ago

Use double precision sounds like a useful feature, but I am uncertain about match_xarray. This doesn’t look like a standard part of rasterio — am I mistaken? I would advise not modifying the base interface to achieve this kind of compatibility (given a review on mobile).

  1. I got some inspiration from rioxarray.reproject_match here, which is commonly used to merge datasets.
  2. I am cool with dropping this in favor of double precision support if we can get this test to pass with assert equal not assert close somehow.
  3. I would like a way to specify the exact transform of the Xee dataset without the issues I tried to highlight above--namely that bounds change ever so slightly with the ee.geometry interface before and after getInfo() call. Maybe we can support a tuple[float, float, float, float] type in the geometry as a compromise?
  4. We could also drop the double precision arg and enable behind the scenes if the geometry or scale is float64 somehow?
alxmrs commented 6 months ago

namely that bounds change ever so slightly with the ee.geometry interface before and after getInfo() call. Maybe we can support a tuple[float, float, float, float] type in the geometry as a compromise?

I think this may exist already, or is close to existing. You don’t have to pass in ee.Geometry to set the geometry. Does proj’s CRS object provide one the way?

We could also drop the double precision arg and enable behind the scenes if the geometry or scale is float64 somehow?

I like this idea! It makes sense that it would be related to the projection.

ljstrnadiii commented 6 months ago

I think this may exist already, or is close to existing. You don’t have to pass in ee.Geometry to set the geometry. Does proj’s CRS object provide one the way?

Sorry, what may exist already or close to existing? Support for tuple[float, float, float, float] type in geometry? As far as I understand, you can pass geometry or None. If None, the bounds are taken from the CRS's area of use. This object doesn't have a set method for area_of_use as far as I can tell, which makes sense 🤷 .

I like this idea! It makes sense that it would be related to the projection.

Me, too, but I am not so sure there is a straightforward way to determine if the bounds provided are float32 or float64.

I am starting to think that the safest/simplest thing would be to change to float64 for coords to avoid truncation, and add support for a tuple[float, float, float, float] type in geometry. That would let me precisely define the dataset's transform. Thoughts? I can implement and push up those changes to get a sense of what that would look like if there are no major objections.

alxmrs commented 6 months ago

Sorry Leonard, I have to admit I’m not looking that closely. I see now that you’re right: the geometry arg should take a tuple of bounds.

No objections to your proposal from my end! Thanks for your contribution. :)

On Fri, Feb 16, 2024 at 11:30 PM Leonard @.***> wrote:

I think this may exist already, or is close to existing. You don’t have to pass in ee.Geometry to set the geometry. Does proj’s CRS object provide one the way?

Sorry, what may exist already or close to existing? Support for tuple[float, float, float, float] type in geometry? As far as I understand, you can pass geometry or None. If None, the bounds are taken from the CRS's area of use https://pyproj4.github.io/pyproj/stable/api/crs/crs.html#pyproj.crs.CRS.area_of_use. This object doesn't have a set method for area_of_use as far as I can tell, which makes sense 🤷 .

I like this idea! It makes sense that it would be related to the projection.

Me, too, but I am not so sure there is a straightforward way to determine if the bounds provided are float32 or float64.

I am starting to think that the safest/simplest thing would be to change to float64 for coords to avoid truncation, and add support for a tuple[float, float, float, float] type in geometry. That would let me precisely define the dataset's transform. Thoughts? I can implement and push up those changes if there are no major objections.

— Reply to this email directly, view it on GitHub https://github.com/google/Xee/pull/138#issuecomment-1948820164, or unsubscribe https://github.com/notifications/unsubscribe-auth/AARXAB2V7NTXJ2ASLRZW6I3YT6CSZAVCNFSM6AAAAABDF2AQDOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBYHAZDAMJWGQ . You are receiving this because you were mentioned.Message ID: @.***>

ljstrnadiii commented 6 months ago

@alxmrs I am not sure how you will feel about keeping geometry only or if you would prefer a bounds or bbox argument. I added the proposed changes. Also, apologies for the messy commit history, but the last three should be helpful in demonstrating things.

ljstrnadiii commented 6 months ago

@alxmrs thanks for another review! I am going to simplify types as I mentioned in comments and add a small demo in the readme. As far as CI, I think auth is failing because my user name does have the credentials? I am not sure if this is on my end or yours? I am guessing an author/google employee may need to trigger CI for auth to work correctly?

alxmrs commented 6 months ago

I think @jdbcode or @naschmitz can help with CI. I’m not familiar with how it’s set up.

ljstrnadiii commented 6 months ago

@jdbcode or @naschmitz any tips? The tests workflow fails at authentication.

ljstrnadiii commented 6 months ago

@jdbcode @naschmitz kindly bumping

naschmitz commented 5 months ago

Sorry for the delay. We're extremely understaffed right now. I'm going to take a look at your PR now.

ljstrnadiii commented 5 months ago

@naschmitz thanks for the review and no worries on the timeline. I'm just stoked to get it in as this is basically my most substantial OSS contribution to date after 6 years of startup life.

Tests pass locally btw! Thanks again and let me know if there is anything else you would like to see 🙏

ljstrnadiii commented 5 months ago

Any idea what is up with CI test step authentication?