Closed HeatherSavoy-USDA closed 2 years ago
Agree re: allowing user to select a resampling method! Thanks for the notes above on rasterio
- that made me want to dig a little deeper into how rasterio
implements reprojection. I took a look at the rasterio
source code, and the good news is that it is a compiled C extension that links to the gdal
C++ libraries for all reprojection functionality (see here, e.g.). That is what I hoped, but I wanted to confirm. That is good news because it means all operations are in-memory with no additional I/O or calls to external programs - exactly what we want for building a scalable implementation that supports efficient parallelism (e.g., with shared memory, which is impossible if calling external utilities).
One potential issue is that if the user requests multiple datasets/variables, there could be different appropriate methods for each. I'd be ok with only supporting one requested method for now and including in the documentation a note suggesting doing separate requests if needed?
If we want different default methods based on continuous/categorical, we should provide that attribute in the metadata.
I added the option to specify a single resampling method per request with the default always being nearest neighbor.
Different use cases will need different resampling methods used in reprojection. We are calling
reproject
from the rioxarray package, which looks to default to using the nearest neighbor method. That's the fastest, but won't be advisable for most use cases.It would be nice to have an option for the user to choose from available resampling methods. To start we could do something like default to nearest neighbor for categorical variables (when we support them) and bilinear for continuous variables?
Note: rioxarray uses rasterio.warp.reproject which in turn uses GDAL. Available method will depend on the GDAL version available. rasterio supported methods GDAL methods doc