Proposed Recipes for Euro-Cordex

rabernat commented 2 years ago

The following is transcribed from @larsbuntemeyer in https://github.com/pangeo-data/pangeo/issues/862

Source Dataset

I wanted to drop some of my thoughts here on bringing WCRP EURO-CORDEX datasets to the cloud. That would be cmorized datasets on the European Cordex domain that are currently available only for access by download from, e.g., ESGF or the Coperniucs Climate Data Store.

Link to the website / online documentation for the data - https://www.euro-cordex.net/
The file format (e.g. netCDF, csv) - ???
How are the source files organized? (e.g. one file per day):
i guess i would be able to get support from DKRZ-ESGF where i usually work with the ensemble and where also an intake collection is maintaned. The ensemble contains about:
- up to 150 datasets on the EUR-11 Cordex domain each for a number of frequently requested variables,
- about 75 TB of data volume for the complete ensemble and all variables on the EUR-11 domain.
How are the source files accessed (e.g. FTP)
- a notebook that shows access to the 2m surface temperature EURO-CORDEX ensemble dataset at DKRZ.
Any special steps required to access the data (e.g. password required) - Raw data from ESGF or CDS. Copy at DKRZ.

Transformation / Alignment / Merging

???

Output Dataset

Zarr?

Licensing Question

I am especially wondering what license would be required for the data to be made available publicly and if you think that the Cordex terms of use would be a problem for distributing the data freely? Right now, on ESGF only CMIP5 and CMIP6 data are freely available while for Cordex you still have to register.

I would be interested in your thoughts on whether that data could be successively made available through PANGEO cloud storage. As i said, right now this is just an idea, but the Euro-Cordex General Assembly is coming up in the end of January 2022 and i wanted to bring that up and discuss it in the community. Thanks a lot!

rabernat commented 2 years ago

Regarding the license, I note that there are two different licenses for CORDEX:

Terms of use for CORDEX data for non-commercial research and educational purposes of:

a) I agree to restrict my use of CORDEX model output for non-commercial research and educational purposes only. Results from non-commercial research are expected to be made generally available through open publication and must not be considered proprietary. Materials prepared for educational purposes cannot be sold. These restrictions may only be relaxed by permission of the individual modelling groups responsible for the simulations.

OR

Terms of use for CORDEX data for commercial purposes (unrestricted use):

a) I understand that the subset of CORDEX model output that will be made accessible to this group has been designated for "unrestricted" use.

A big reason for putting data in the cloud is to make it more accessible for commercial use. So could we choose the option of the commercial license?

larsbuntemeyer commented 2 years ago

Thanks @rabernat for moving this issue, the recipe approach sounds great!! I am familiar with it from conda forge feedstocks. Just found some more detailled info on the license of individual model_ids.

rabernat commented 2 years ago

The easiest way to get the data will probably be via ESGF, like for CMIP6: https://cordex.org/data-access/

larsbuntemeyer commented 2 years ago

I should be able to adapt the CMIP6 recipe for this, see https://github.com/larsbuntemeyer/cordex-forge-dev/issues/1. However, for CORDEX, i still have to login to ESGF, before I can download. I can run the recipe locally, however, i am not sure how to manage credentials, although it seems to be solved by https://github.com/pangeo-forge/pangeo-forge-recipes/issues/53?

larsbuntemeyer commented 2 years ago

This solves my logon problem.

naught101 commented 1 year ago

@larsbuntemeyer did you get any further with this?

pangeo-forge / staged-recipes