Open simonreise opened 1 week ago
@simonreise When using xr.where
with mask data computed from pixel values of the original image one should prefer to use "raw numpy arrays" and avoid all the coordinate mismatch nonsense, it's not only spatial_ref
issue it can also be slight variations in coordinate locations (+/- 1e-14
difference kinda thing).
In your case just change:
- pan_masked = pan.where(mask_rep == 0, 0)
+ pan_masked = pan.where(mask_rep.data == 0, 0)
But sure, the value stored in the spatial_ref
can be changed to always be 0
to match what rioxarray
does, we don't really check for that, and in fact rioxarray loaded sources still support .odc.geobox
. Not sure this would fix that issue though. xrarray.where()
works best when you provide raw numpy/dask
array of the same shape as the xarray
data as a mask and not an xarray
array.
When mask is an xarray
array all the extra work of figuring out "valid overlap", common coords, common dims, common attributes needs to be done by xarray
often causing not only slow downs but "unexpected behaviours" like the one you are reporting.
@simonreise can you please check if forcing 0
into spatial_ref
fixes the issue in your case or not:
replace epsg
with 0
on line 217 referenced above
Both of your suggestions worked: adding .data
to xarray.where
or forcing 0
into spatial_ref
. Thank you.
Upd. forcing 0
fixes the issue even if you do not add .data
and adding .data
works if you do not force 0
Short description
I use
odc-geo
to reproject and match Landsat 8 QA mask raster to the resolution of the panchromatic band. All the other operations with data are performed usingrioxarray
.The issue is that
rioxarray
setsspatial_ref
value to 0 by default, but thenodc-geo
re-sets that value to the EPSG code of the projection, and, when I mask the original panchromatic band (that was loaded by rioxarray) with QA band (that was reprojected by odc-geo) usingxarray.where
,spatial_ref
coordinate is being dropped.Libraries import
Loading the panchromatic band with rioxarray
Loading the QA raster with rioxarray
Checking the coordinates of the panchromatic band
Coordinates:
More detailed look into
spatial_ref
The value is 0, and all the data is stored in the attrs.
array(0) Coordinates: spatial_ref () int32 0 Indexes: (0) Attributes: (18)
Checking the coordinates of the QA band
Coordinates:
More detailed look into
spatial_ref
The value is also 0, and all the data is stored in the attrs.
array(0) Coordinates: spatial_ref () int32 0 Indexes: (0) Attributes: (18)
Reprojecting the QA band
We reproject the QA band to match the geobox of the panchromatic band.
Checking the coordinates of the reprojected QA band
Coordinates:
More detailed look into
spatial_ref
The value is 32646 - the EPSG code of the projection. More detailed data is still stored in the attrs.
array(32646) Coordinates: spatial_ref () int32 32646 Indexes: (0) Attributes: (18)
Masking the panchromatic band
Then we use our reprojected mask (
mask_rep
) to mask the panchromatic band. IRL the function is more complicated, but this minimum example still shows the problem.Checking the coordinates
The
spatial_ref
is gone. It must be because the coordinates from two arrays had different values.Coordinates:
If we use two xarrays loaded with rioxarray, the problem does not appear. If we just simply reassign the coord to 0, (
pan_w = pan.where(mask_rep.assign_coords(spatial_ref=0) == 0, 0)
) the problem does not appear.To sum up
Is it really necessary to have a EPSG code of a CRS as a default value for
spatial_ref
? Maybe just use 0 as a default value like rioxarray does, as the actual CRS data anyway is mostly stored in the attributes, not in the coordinate value itself? Or convince rioxarray devs to also use EPSG code as a default value? I think there should be a common convention for both libraries.I can still manually assign the
spatial_ref
, but it looks like a really dirty fix.Env