pangeo-forge / staged-recipes

A place to submit pangeo-forge recipes before they become fully fledged pangeo-forge feedstocks
https://pangeo-forge.readthedocs.io/en/latest/
Apache License 2.0
39 stars 63 forks source link

Example pipeline for [coawst_4] #10

Open ocefpaf opened 4 years ago

ocefpaf commented 4 years ago

Source Dataset

@rsignell-usgs will do the description later ;-p

https://geoport.usgs.esipfed.org/thredds/catalog/coawst_4/use/fmrc/catalog.html?dataset=coawst_4/use/fmrc/coawst_4_use_best.ncd

netCDF

Forecast model run collection

netCDFSubet Service

https://geoport.usgs.esipfed.org/thredds/ncss/coawst_4/use/fmrc/coawst_4_use_best.ncd?var=Hwave&disableLLSubset=on&disableProjSubset=on&horizStride=1&time_start={{yyyy-mm-HHTMM}}%3A00%3A00Z&time_end={{yyyy-mm-HHTMM}}%3A00%3A00Z&timeStride=1&vertCoord=&accept=netcdf

Transformation / Alignment / Merging

Re-chunk time 1 to x

Output Dataset

zarr

rabernat commented 3 years ago

Is there just a single sequence of files? If so, this recipe might be shovel ready with the latest version of pangeo forge.

Or is there the usual 2D lead time / start date / ensemble member organization (see #17). If so, this would require https://github.com/pangeo-forge/pangeo-forge/issues/39 to be resolved before implementing.

rsignell-usgs commented 3 years ago

@rabernat , yup, it is just a sequence of files, no overlaps!

But these NetCDF files are not (currently) available via FTP.

They are aggregated virtually via THREDDS catalog (in which additional metadata is added via NcML) here: https://geoport.usgs.esipfed.org/thredds/catalog/coawst_4/use/fmrc/catalog.html?dataset=coawst_4/use/fmrc/coawst_4_use_best.ncd

Could we use the OPeNDAP endpoint as the starting point for Pangeo Forge?

If not, we could make the NetCDF files available and add the metadata via Python later

It will take a while to download 10TB of data no matter what, I guess...

rabernat commented 3 years ago

yup, it is just a sequence of files, no overlaps!

Good news! So this might be shovel ready.

Could we use the OPeNDAP endpoint as the starting point for Pangeo Forge?

We should definitely support OPenDAP inputs! This probably doesn't work right now, but it will be very easy to implement, since Xarray can read them. I'll create an issue for that.

One useful thing you could do for me right here is to provide an example list of 5 specific sequential URLs, as a comment on this issue.

rabernat commented 3 years ago

I believe that this should work with the latest master of pangeo-forge, with cache_inputs=False and copy_input_to_local_file=False.