This was proposed by @grovduck to leverage the raster processing workflow in the ImageWrapper classes (flattening, filling NaNs, building NoData masks, applying a function, unflattening, and masking NoData) outside of estimator methods. The initial goal was to calculate weighted attributes from a pre-computed neighbor image using Dask, but ideally it would work for any function that takes and returns 2D Numpy arrays in the shape (pixel, band).
We should be able to tackle this by creating a higher-order function like apply_ufunc_across_bands that takes 1) an image (NDArray, xr.Dataset, xr.DataArray), 2) a compatible ufunc, and 3) some metadata arguments to define the output image (e.g. dimension sizes, data types), and uses xarray.apply_ufunc to parallelize that operation across chunks. The pre- and post-processing would be handled by an HOF in ImageWrapper that pre-processes, applies the ufunc, and post-processes each Numpy chunk.
Compared to the current approach of pre- and post-processing each image type natively (i.e. xr.DataArrays are flattened asxr.DataArrays), being able to process everything at the Numpy level would simplify things substantially. We would of course still need some type-specific behavior like parsing nodata, assigning coordinate names, etc.
Performance is an important, unknown question here that may determine whether this is worth pursuing. I did a quick test that suggested that prediction with a small array was substantially quicker with the proposed approach compared to the current implementation, but we'll need to see how that scales and whether it applies to other operations. If it ends up slowing things down, we may need to reconsider other alternatives to exposing the processing API.
This was proposed by @grovduck to leverage the raster processing workflow in the
ImageWrapper
classes (flattening, filling NaNs, building NoData masks, applying a function, unflattening, and masking NoData) outside of estimator methods. The initial goal was to calculate weighted attributes from a pre-computed neighbor image using Dask, but ideally it would work for any function that takes and returns 2D Numpy arrays in the shape (pixel, band).We should be able to tackle this by creating a higher-order function like
apply_ufunc_across_bands
that takes 1) an image (NDArray
,xr.Dataset
,xr.DataArray
), 2) a compatible ufunc, and 3) some metadata arguments to define the output image (e.g. dimension sizes, data types), and usesxarray.apply_ufunc
to parallelize that operation across chunks. The pre- and post-processing would be handled by an HOF inImageWrapper
that pre-processes, applies the ufunc, and post-processes each Numpy chunk.Compared to the current approach of pre- and post-processing each image type natively (i.e.
xr.DataArrays
are flattened asxr.DataArrays
), being able to process everything at the Numpy level would simplify things substantially. We would of course still need some type-specific behavior like parsing nodata, assigning coordinate names, etc.Performance is an important, unknown question here that may determine whether this is worth pursuing. I did a quick test that suggested that prediction with a small array was substantially quicker with the proposed approach compared to the current implementation, but we'll need to see how that scales and whether it applies to other operations. If it ends up slowing things down, we may need to reconsider other alternatives to exposing the processing API.