Support geobox extraction from rioxarrays loaded from non-rectilinear sources

Kirill888 commented 1 year ago

Geobox from xarrays produced by rioxarray from non-rectilinear sources can not be currently extracted.

#!wget https://github.com/corteva/rioxarray/raw/master/test/test_data/input/2d_test.tif
import rioxarray
import odc.geo.xr
from odc.geo.geobox import GeoBox

xx = rioxarray.open_rasterio("2d_test.tif")
display(xx.spatial_ref.attrs)
gbox_rio = GeoBox(xx.rio.shape, xx.rio.transform(), xx.rio.crs)
display(gbox_rio)
assert xx.odc.crs == xx.rio.crs
assert xx.odc.geobox == gbox_rio # Fails here

with the current code, xx.odc.geobox is None, it should not be. We should detect presence of GeoTransform attribute on the spatial_ref coordinate and use that to construct a geobox instead of the normal way of computing rectilinear transform from x/y coordinates that might be absent (when using parse_coordinates=False option in rioxarray).

snowman2 commented 1 year ago

Note: you can use rioxarray to build the geobox: https://corteva.github.io/rioxarray/stable/getting_started/crs_management.html

snowman2 commented 1 year ago

Here is the function used by Geocube: https://github.com/corteva/geocube/blob/e32d1632c60b27b37c0fd4e5e1195877c48a14d2/geocube/geo_utils/geobox.py#L23

snowman2 commented 1 year ago

It is used as backup when the geobox is not found: https://github.com/corteva/geocube/blob/e32d1632c60b27b37c0fd4e5e1195877c48a14d2/geocube/geo_utils/geobox.py#L163

snowman2 commented 1 year ago

Side note: rioxarray loads 2D coords and names them xc and yc.

Kirill888 commented 1 year ago

Thanks for the links Alan. My plan is to capture GeoTransform from spatial_ref coordinate as something like original_source_transform, and then use that for 2 things

non-rectilinear sources (2d coords or no coords)
rectilinear sources cropped to 1x1, 1xH, Wx1 sizes, as you suggested in #53

Currently I'm NOT planning to interpret 2d coords for computing correct transform from cropped arrays, but it should be possible. My issue with 2d coords is that one can't tell if this was produced from a linear mapping of some sorts or from completely non-linear process, like from GCPs. In odc-geo we use 1-d x/y coords, but store original pixel coordinates there instead: 0.5, 1.5, ..., these take up less space and are fast to create, we then also capture original transform. Combination of the two allows to dynamically compute correct transform for cropped sources. The limitation of this approach is that you can't use xarray plot utilities with world coords, not until you add those manually anyway.

Side note: rioxarray loads 2D coords and names them xc and yc.

Yes, I'm aware of this now. When I was first debugging this issue I worked with this file:

https://umbra-open-data-catalog.s3.amazonaws.com/sar-data/tasks/Melbourne,%20Australia/02a3d6c7-d9e1-4a50-be13-5b2112a3ec1e/2023-02-08-11-51-12_UMBRA-04/2023-02-08-11-51-12_UMBRA-04_GEC.tif

and it was taking a long time to construct dask representation, so I stopped that and tried with parse_coordinates=False which produces output much quicker, but that got me confused for a little bit. Question: those xc,yc 2d coords, are they always non-lazy arrays, even when constructing Dask outputs?

snowman2 commented 1 year ago

Question: those xc,yc 2d coords, are they always non-lazy arrays, even when constructing Dask outputs?

Currently, yes they are always non-lazy.

opendatacube / odc-geo

Support geobox extraction from rioxarrays loaded from non-rectilinear sources #82