opendatacube / odc-geo

GeoBox and geometry utilities extracted from datacube-core
https://odc-geo.readthedocs.io/en/latest/
Apache License 2.0
74 stars 11 forks source link

Set a spatial_ref value to 0 instead of EPSG code for better compatibility with rioxarray #164

Open simonreise opened 1 week ago

simonreise commented 1 week ago

Short description

I use odc-geo to reproject and match Landsat 8 QA mask raster to the resolution of the panchromatic band. All the other operations with data are performed using rioxarray.

The issue is that rioxarray sets spatial_ref value to 0 by default, but then odc-geo re-sets that value to the EPSG code of the projection, and, when I mask the original panchromatic band (that was loaded by rioxarray) with QA band (that was reprojected by odc-geo) using xarray.where, spatial_ref coordinate is being dropped.

Libraries import
import xarray
import rioxarray
import odc.geo.xr
Loading the panchromatic band with rioxarray
pan = r"path/to/file/LC09_L1TP_142020_20230825_20230825_02_T1/LC09_L1TP_142020_20230825_20230825_02_T1_B8.TIF"
with rioxarray.open_rasterio(pan, chunks=True, lock=True) as img:
    pan = img
Loading the QA raster with rioxarray
mask = r"path/to/file/LC09_L1TP_142020_20230825_20230825_02_T1/LC09_L1TP_142020_20230825_20230825_02_T1_QA_PIXEL.TIF"
with rioxarray.open_rasterio(mask, chunks=True, lock=True) as img:
    mask = img
Checking the coordinates of the panchromatic band
pan.coords

Coordinates:

More detailed look into spatial_ref

The value is 0, and all the data is stored in the attrs.

pan.spatial_ref

array(0) Coordinates: spatial_ref () int32 0 Indexes: (0) Attributes: (18)

Checking the coordinates of the QA band
mask.coords

Coordinates:

More detailed look into spatial_ref

The value is also 0, and all the data is stored in the attrs.

mask.spatial_ref

array(0) Coordinates: spatial_ref () int32 0 Indexes: (0) Attributes: (18)

Reprojecting the QA band

We reproject the QA band to match the geobox of the panchromatic band.

mask_rep = mask.odc.reproject(pan.odc.geobox)
Checking the coordinates of the reprojected QA band
mask_rep.coords

Coordinates:

Masking the panchromatic band

Then we use our reprojected mask (mask_rep) to mask the panchromatic band. IRL the function is more complicated, but this minimum example still shows the problem.

pan_masked = pan.where(mask_rep == 0, 0)
Checking the coordinates

The spatial_ref is gone. It must be because the coordinates from two arrays had different values.

pan_masked.coords

Coordinates:

If we use two xarrays loaded with rioxarray, the problem does not appear. If we just simply reassign the coord to 0, (pan_w = pan.where(mask_rep.assign_coords(spatial_ref=0) == 0, 0)) the problem does not appear.

To sum up

Is it really necessary to have a EPSG code of a CRS as a default value for spatial_ref? Maybe just use 0 as a default value like rioxarray does, as the actual CRS data anyway is mostly stored in the attributes, not in the coordinate value itself? Or convince rioxarray devs to also use EPSG code as a default value? I think there should be a common convention for both libraries.

I can still manually assign the spatial_ref, but it looks like a really dirty fix.

Env

Kirill888 commented 1 week ago

@simonreise When using xr.where with mask data computed from pixel values of the original image one should prefer to use "raw numpy arrays" and avoid all the coordinate mismatch nonsense, it's not only spatial_ref issue it can also be slight variations in coordinate locations (+/- 1e-14 difference kinda thing).

In your case just change:

- pan_masked = pan.where(mask_rep == 0, 0)
+ pan_masked = pan.where(mask_rep.data == 0, 0)

But sure, the value stored in the spatial_ref can be changed to always be 0 to match what rioxarray does, we don't really check for that, and in fact rioxarray loaded sources still support .odc.geobox. Not sure this would fix that issue though. xrarray.where() works best when you provide raw numpy/dask array of the same shape as the xarray data as a mask and not an xarray array.

When mask is an xarray array all the extra work of figuring out "valid overlap", common coords, common dims, common attributes needs to be done by xarray often causing not only slow downs but "unexpected behaviours" like the one you are reporting.

Kirill888 commented 1 week ago

@simonreise can you please check if forcing 0 into spatial_ref fixes the issue in your case or not:

https://github.com/opendatacube/odc-geo/blob/78577dc15f6791e17f18706c5a6bb6ba7a302b77/odc/geo/_xr_interop.py#L216-L221

replace epsg with 0 on line 217 referenced above

simonreise commented 1 week ago

Both of your suggestions worked: adding .data to xarray.where or forcing 0 into spatial_ref. Thank you. Upd. forcing 0 fixes the issue even if you do not add .data and adding .data works if you do not force 0