pangeo-forge / staged-recipes

A place to submit pangeo-forge recipes before they become fully fledged pangeo-forge feedstocks
https://pangeo-forge.readthedocs.io/en/latest/
Apache License 2.0
39 stars 63 forks source link

Proposed Recipes for DWD ICON Forecasts #263

Open jacobbieker opened 9 months ago

jacobbieker commented 9 months ago

Dataset Name

DWD ICON Global Model Forecast

Dataset URL

https://opendata.dwd.de/weather/nwp/

Description

This is the German weather service's (DWD) ICON NWP model. These forecasts go out to 120 hours, and are ran 4 times a day. These are open forecasts, but only available for 24 hours before being deleted off of the open data server. The model has a higher spatial resolution than GFS, and is freely available. At Open Climate Fix, we are using it in conjunction with other global NWP forecasts for renewable energy forecasting in different parts of the world, and think it can be generally useful for many organizations to have access to these types of forecasts.

License

German Government License

Data Format

Grib

Data Format (other)

No response

Access protocol

HTTP(S)

Source File Organization

Multiple files per day, one per parameter, level, and timestep, all organized as a single folder per init time, and then one sub folder per parameter . Each grib2 is bzipped.

Example URLs

Filenames change per day,but an example one is 
https://opendata.dwd.de/weather/nwp/icon/grib/00/t/
https://opendata.dwd.de/weather/nwp/icon/grib/00/v/icon_global_icosahedral_pressure-level_2023121600_180_250_V.grib2.bz2

Authorization

No; data are fully public

Transformation / Processing

Concatenating along time and timestep dimensions.

Target Format

Zarr

Comments

We are currently running some cron jobs/ own open-source NWP archiver to archive forecasts, like in this example in Hugging Face: https://huggingface.co/datasets/openclimatefix/dwd-icon-global/blob/main/data/2023/12/16/20231216_00.zarr.zip as well as for the ICON-EU model. One limitation there is that there is a 50GB filesize limit, so we only archive the first 4 days of global forecasts. Partly wondering if Pangeo-Forge would work well for a task that needs to run every day, and either append or generate new Zarrs for each init time? Or would this more be out of the scope of Pangeo-Forge right now? I've seen https://github.com/pangeo-forge/user-stories/issues/5 that seems like it might relate to some limitations for a recipe that needs to append data. This also seems to be a bit different than other proposed recipes, like https://github.com/pangeo-forge/staged-recipes/issues/136 in that the older data disappears, rather than is continually stored. This kind of recipe would also be useful for some other openly accessible, but limited public archive forecasts like this archive of MeteoFrance Forecasts and CMC's GDPS and GEPS forecasts, data available here