opendatacube / odc-geo

GeoBox and geometry utilities extracted from datacube-core
https://odc-geo.readthedocs.io/en/latest/
Apache License 2.0
80 stars 12 forks source link

Consider adding functionality like `odc.geo.xr.replace_coords()` #134

Open alexgleith opened 6 months ago

alexgleith commented 6 months ago

Current options to write data out with a different definition of geobox coordinates include either:

a) using the private method

_write_cog(
      data,
      new_geobox,
      output_file,
)

b) assigning coordinates, which has a fixed coordinate name

# Set up a new Affine and GeoBox
new_affine = Affine(0.01, 0.0, -180.0, 0.0, 0.01, -89.995, 0.0, 0.0, 1.0)
new_geobox = GeoBox(data.odc.geobox.shape, new_affine, data.odc.geobox.crs)

new_coords = xr_coords(new_geobox)

# Rename lat to latitude and lon to longitude and update the coordinates of the xarray
new_data = data.rename({"lat": "latitude", "lon": "longitude"}).assign_coords(
    new_coords
)

So having a functionality that can safely do with without compromise or complexity would be nice.

Kirill888 commented 6 months ago

We should try to keep same style as xarray api, they tend to use set_ prefix, used to be assign_, but that seems to be deprecated. So something like set_geo_coords :: GeoBox -> xr.Dataset | xr.DataArray making sure to return a new xarray object wrapping same pixel data.

alexgleith commented 6 months ago

Possibly need to pass in dimension names too. Or to infer them, perhaps it's an implementation detail.

Kirill888 commented 6 months ago

currently, these are fixed to y,x or latitude,longitude based on CRS, but xr_coords should make it easy to

  1. force to always be y,x
  2. use custom names

and the proposed method should just delegate that part to xr_coords (pass-through kwargs). It should also ensure that whatever other, non-spatial dimensions and coords are present on input remain on output also.

Essentially what we want is xx | drop-geo-coords | set-geo-coords [opts]

robbibt commented 4 months ago

Is this issue mostly resolved by the addition of .odc.reload() @Kirill888/@alexgleith? https://odc-geo.readthedocs.io/en/latest/_api/odc.geo.xr.ODCExtensionDa.reload.html

If so, we might want to add a little extra to those docstrings to explain how it could be used in a scenario like this.

alexgleith commented 3 months ago

I don't think so, @robbibt. Sometimes automatic coordinate loading leads to bad numbers due to floating point precision. (i.e., 1.9999999 rather than 2.0).

So explicitly setting the geobox, although it's dangerous, is useful sometimes.

Kirill888 commented 3 months ago

So does something like:

xx = xx.assign_coords(odc.geo.xr.xr_coords(actual_geobox_you_want))

do what you need in this case @alexgleith

Kirill888 commented 3 months ago

I guess I don't understand why you need to rename and keep old coords on the same dataset. Or is the problem in figuring out what actual_geobox_you_want from geobox_in_the_data_currently.

The problem of "I put one geobox in, but what I'm getting back is slightly different" should have been fixed by recent changes that take more care to re-use original affine matrix where possible instead of recomputing from coordinate data.

alexgleith commented 3 months ago

I guess I don't understand why you need to rename

It was throwing an error before. Looks like it's not required now, which is nice.

alexgleith commented 3 months ago

There seems to be a geobox per variable now, @Kirill888

data.odc.geobox.affine

[0.01, 0.0, -180.0, 0.0, -0.01, 89.995, 0.0, 0.0, 1.0]

list(data.analysed_sst.odc.geobox.affine)

[0.010009765625, 0.0, -179.99501037597656, 0.0, -0.0099945068359375, 89.9949951171875, 0.0, 0.0, 1.0]

alexgleith commented 3 months ago

Reported over here https://github.com/opendatacube/odc-geo/issues/157