Open sckw opened 1 year ago
Thanks for the request and detailed explanation, @sckw. Since the data is from ECMWF, I think we should probably use weather-dl
to cache it, before transforming to Zarr with Pangeo Forge.
@alxmrs may be able to advise. Alex, I see weather-dl
is documented as a CLI, but is there an opportunity for using its objects directly in a Pangeo Forge pipeline; e.g., in very coarse pseudocode (with the weather-dl
bits referenced from here):
from pangeo_forge_recipes.transforms import StoreToZarr
from weather_dl.fetcher import Fetcher
recipe = (
beam.Create(...)
# the weather-dl part
| 'GroupBy Request Limits' >> beam.GroupBy(...)
| 'Fetch Data' >> beam.ParDo(Fetcher(...))
# some tbd adapter
| SomeAdapterTransform()
# the Pangeo Forge part
| StoreToZarr()
)
?
I will take a deeper look at how PGF could use weather-tools (specifically weather-dl) as a library this Tuesday. I have a few ideas and words of caution.
I want to make sure that @mahrsee1997 has seen this, as he is now our weather-dl expert.
Some initial thoughts:
Would you be open to a semi-beam or non-beam based solution? With weather-dl 1.5 or 2, we’ve found more stability in downloading and higher utilization of ECMWF licenses.
I think this could integrate well into PGF in other ways; for example, by having Zarr conversion react to raw data appearing in a bucket (more in line with the streaming stuff we’ve been talking about).
I think this could integrate well into PGF in other ways; for example, by having Zarr conversion react to raw data appearing in a bucket (more in line with the streaming stuff we’ve been talking about).
I love this idea and discussed with @rabernat who agreed it's a great direction for us to take generally. Opened https://github.com/pangeo-forge/pangeo-forge-recipes/issues/598 to discuss details. Thanks for summarizing the pros/cons of using weather-dl
transforms directly in Pangeo Forge. That seems sufficiently difficult, and the streaming alternative sufficiently elegant, that I think we should focus 100% on the streaming option. 😄
Dataset Name
Ocean Reanalysis System 5
Dataset URL
https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-oras5?tab=form
Description
This is an ocean reanalysis dataset from ECMWF. It provides 3D global, gridded (.25x.25), monthly mean data from 1958 to present.
Variables include: velocity (meridional, zonal), wind stress, MLDs, net heat flux, SSTs, potential temp, SSS, salinity, sea ice, SSH, etc.
The dataset is large and takes a while to download for individual uses. It would be useful to have this downloaded and stored at the LEAP Data Library instead of users downloading sub-datasets and storing it on their personal storage.
Size
3D datasets (e.g. meridional velocity) are larger ~10GB+. Single level datasets (e.g. SST) are <200MB
License
https://cds.climate.copernicus.eu/api/v2/terms/static/licence-to-use-copernicus-products.pdf
Data Format
NetCDF
Data Format (other)
No response
Access protocol
HTTP(S)
Source File Organization
There is one file per month, which is equal to one timestep. One year of data would have 12 netcdf files (months) that can be concatenated.
Files are downloaded via API request. Documentation on the download how-to is found here: https://cds.climate.copernicus.eu/api-how-to
Example of an API request for meridional velocity data, for 2009-2011 (all months), is shown below.
Example URLs
Authorization
API Token
Transformation / Processing
Since all files are one timestep ( one month), it would be useful to have the datasets be combined into at least years or the whole 1958-present range.
The files are also in 0.25x0.25 resolution, and so a 1x1 regridding could be useful.
Target Format
Zarr
Comments
No response