secondlaw-ai / skyrim

🌎 🤝 AI weather models united
Apache License 2.0
153 stars 7 forks source link

Intitial condition data from open data buckets and using Kerchunk/Dynamic zarr method #11

Open nishadhka opened 5 months ago

nishadhka commented 5 months ago

Thank you so much for the library and efforts put forward for simplifying the AI models on weather forecast. Related to the initial condition data , such as in the case of NOAA GFS. For a single run (out of 00,06,12, 18), how many hours of forecasts data set (for example GFS gives in each run 240 hours of forecast for every 3 hours) is required to apply for the AI models.

Also couldn't find in the https://github.com/secondlaw-ai/skyrim/blob/master/skyrim/core/fetch.py, routine specific to GFS download as shown in documentation.

In case the intial condition dataset download is a bottleneck, is there plan to use the kerchunk https://fsspec.github.io/kerchunk/ or recent changes in using the grib index based method to stream the initial condition dataset from GFS and ensemble forecast system such as https://github.com/fsspec/kerchunk/pull/399, and an GFS data stream method in https://github.com/asascience-open/nextgen-dmac/blob/main/grib_index_aggregation/dynamicgribchunking.ipynb for GFS https://registry.opendata.aws/noaa-gfs-bdp-pds/ The initial condition dataset on ensemble forecast systems for GEFS, https://registry.opendata.aws/noaa-gefs/ for ECMWF, https://registry.opendata.aws/ecmwf-forecasts/

efesurekli commented 1 month ago

No worries! We are adding support to all NWPs as well, not only for initial conditions but also for benchmarking & ensembling they are useful. Currently, GFS and IFS are here. We haven't implemented streaming yet for GFS but have parallel fetching for grib chunks. Can add this to the backlog, as we already have an item to speed up GFS fetching. ENS is going to be added soon (probably within this week) and GEFS is after that.