Open ciaransweet opened 3 years ago
Thanks a lot for the issue @ciaranevans! I think we are all in agreement about the need to support COG as an output format.
One general question I have about a recipe is whether there is always a 1:1 mapping between a set of inputs and an output dataset? Or could we have a recipe that reads a set of NetCDF file and produces BOTH a Zarr and a set of COGs? I think the long term answer is yes, but for the short term, I don't it's an important feature to support right now.
Instead let's focus on the 1:1 case for now and add COGs as a supported output.
The question of how to implement this raises some software design questions for which we would welcome your input. As discussed in #27, our idea is to slowly refactor the monster NetCDFtoZarrSequentialRecipe
into a series of classes and mixins that can be re-composed to easily define new recipe classes. (In fact, this is how we originally tried to set up the code, but we abandoned it because it just felt like premature complexification.) For example, we could imagine refactoring to define this class as
class NetCDFtoZarrSequentialRecipe(SequentialRecipe, NetCDFInputMixin, ZarrOutputMixin):
pass
and then simply being able to write
class NetCDFtoCOGSequentialRecipe(SequentialRecipe, NetCDFInputMixin, COGOutputMixin):
pass
We would welcome a PR from you and the DevSeed team to implement COG output support following this sort of pattern.
One general question I have about a recipe is whether there is always a 1:1 mapping between a set of inputs and an output dataset? Or could we have a recipe that reads a set of NetCDF file and produces BOTH a Zarr and a set of COGs? I think the long term answer is yes, but for the short term, I don't it's an important feature to support right now.
I also think that in the long term we want to support multiple outputs, just like in conda-forge we have recipes that produce e.g. a shared library and a static library, or a binary library (for using in C) and its Python bindings. Not only does it save resources (the download of the original files, the chunking...), it also ensures that all the outputs originate from the same reference.
We have a few use-cases where it would be nice to have support/stuff available for the creation of COGs through recipes and pangeo-forge.
Whilst formats like Zarr are great for analysis, our use-cases also often involve displaying global-sized raster datasets. Having first-class citizen support for COG (Cloud Optimised GeoTiffs) would be awesome!
For example there is already the recipe
NetCDFtoZarrSequentialRecipe
, it would be handy to have something likeNetCDFtoCOGRecipe
... whether this takes every variable in a NetCDF and converts it to a COG or allows for the required variables to be selected.I've done some playing around with 'generic' conversion of NetCDF to COG, using
rasterio
,xarray
,rioxarray
, andrio_cogeo
, if useful I can post some of the functions here, though they're pretty 'experimental'.Ideally such functionality would enable some parameterisation of the conversion (CRS, nodata, compression type etc.).
Sorry for the 🧠dump! Happy to flesh out ideas and try to answer questions! I just know we'd love to make some COGs en-masse 😅