Closed mmann1123 closed 2 months ago
@mmann1123 in this line, NaN
values are dropped from the predictors based on the DataArray
metadata 'no data' value. An attempt is made to collect the nodata
value from the raster file metadata. However, if the internal nodata
value is None and the user has not specified a nodata
value, then the nodata
value is set as NaN
.
In the 'ml' module, zeros are left in the predictor array here if the true nodata
values are 0 and the DataArray
nodata
value is NaN
. When you do not set nodata
and this image does not have a nodata
metadata tag, you get:
<xarray.DataArray (band: 3, y: 372, x: 408)>
dask.array<open_rasterio-f620f650fb98ea6ce82cafe95ee46b55<this-array>, shape=(3, 372, 408), dtype=uint16, chunksize=(3, 256, 256), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1 2 3
* x (x) float64 7.174e+05 7.176e+05 7.177e+05 ... 7.783e+05 7.785e+05
* y (y) float64 -2.777e+06 -2.777e+06 ... -2.833e+06 -2.833e+06
Attributes: (12/13)
transform: (150.0, 0.0, 717345.0, 0.0, -150.0, -2776995.0)
crs: 32621
res: (150.0, 150.0)
is_tiled: 0
nodatavals: (nan, nan, nan)
_FillValue: nan
... ...
offsets: (0.0, 0.0, 0.0)
filename: /mnt/c/Users/jbgra/Documents/code/geowombat/src/geow...
resampling: nearest
AREA_OR_POINT: Area
_data_are_separate: 0
_data_are_stacked: 0
When you set nodata
you get:
<xarray.DataArray (band: 3, y: 372, x: 408)>
dask.array<open_rasterio-f620f650fb98ea6ce82cafe95ee46b55<this-array>, shape=(3, 372, 408), dtype=uint16, chunksize=(3, 256, 256), chunktype=numpy.ndarray>
Coordinates:
* band (band) int64 1 2 3
* x (x) float64 7.174e+05 7.176e+05 7.177e+05 ... 7.783e+05 7.785e+05
* y (y) float64 -2.777e+06 -2.777e+06 ... -2.833e+06 -2.833e+06
Attributes: (12/13)
transform: (150.0, 0.0, 717345.0, 0.0, -150.0, -2776995.0)
crs: 32621
res: (150.0, 150.0)
is_tiled: 0
nodatavals: (0, 0, 0)
_FillValue: 0
... ...
offsets: (0.0, 0.0, 0.0)
filename: /mnt/c/Users/jbgra/Documents/code/geowombat/src/geow...
resampling: nearest
AREA_OR_POINT: Area
_data_are_separate: 0
_data_are_stacked: 0
which makes the 'ml' masking work and drop zeros.
At first, I thought that I could not replicate your issue. But then I saw that the issue is in the predictions. After examining the predictor variables with and without a proper nodata
value, I think that the model is training on a large set of zeros when there is a mismatch between the data nodata
and the DataArray
attribute/metadata nodata
value.
For any classifier other than gausian, the current
predict
workflow requiresnodata=0
forgw.open
. Not sure why this is.For instance
works only if
nodata=0