stuckyb / gcdl

6 stars 2 forks source link

different resampling methods #8

Closed HeatherSavoy-USDA closed 2 years ago

HeatherSavoy-USDA commented 2 years ago

Different use cases will need different resampling methods used in reprojection. We are calling reproject from the rioxarray package, which looks to default to using the nearest neighbor method. That's the fastest, but won't be advisable for most use cases.

It would be nice to have an option for the user to choose from available resampling methods. To start we could do something like default to nearest neighbor for categorical variables (when we support them) and bilinear for continuous variables?

Note: rioxarray uses rasterio.warp.reproject which in turn uses GDAL. Available method will depend on the GDAL version available. rasterio supported methods GDAL methods doc

stuckyb commented 2 years ago

Agree re: allowing user to select a resampling method! Thanks for the notes above on rasterio - that made me want to dig a little deeper into how rasterio implements reprojection. I took a look at the rasterio source code, and the good news is that it is a compiled C extension that links to the gdal C++ libraries for all reprojection functionality (see here, e.g.). That is what I hoped, but I wanted to confirm. That is good news because it means all operations are in-memory with no additional I/O or calls to external programs - exactly what we want for building a scalable implementation that supports efficient parallelism (e.g., with shared memory, which is impossible if calling external utilities).

HeatherSavoy-USDA commented 2 years ago

One potential issue is that if the user requests multiple datasets/variables, there could be different appropriate methods for each. I'd be ok with only supporting one requested method for now and including in the documentation a note suggesting doing separate requests if needed?

If we want different default methods based on continuous/categorical, we should provide that attribute in the metadata.

HeatherSavoy-USDA commented 2 years ago

I added the option to specify a single resampling method per request with the default always being nearest neighbor.