openclimatefix / nwp-consumer

Microservice for consuming NWP data.
9 stars 3 forks source link

META: New Sources and Sinks #12

Open devsjc opened 1 year ago

devsjc commented 1 year ago

Implement new sources:

WEATHER FORECASTS

Implement new sinks:

Sources with stars do not have archives, so would have to be run as continuous downloads. Do Icon first.

devsjc commented 1 year ago

@jacobbieker going to estimate storage costs here. Please correct me if I've got anything wrong!

We are planning on using Icon and ECMWF, everything else is a nice to have.

Type Source Scope Regularity Local size / day
Weather Icon EU Daily to huggingface 0
Global Daily to huggingface? 0
Meteo France EU Daily 24Gb
Global Daily 18Gb
France Every 3 days 35Gb
France HD Every 3 days 9.9Gb
Canada Global Every Day 74Gb
Aerosol SILAM Global Yes 1.3Gb

This is approximately ~130Gb per day.

Leonardo's storage_c has 36Tb available which gives us ~270 days before we run out of space.

devsjc commented 1 year ago

Working for Canada Global:

Two folders in /mnt/storage_c/GPDS/dd.weather.gc.ca/model_gem_global/15km/grib2/lat_lon, one for each initialisation time in HH format:

00 12

Each of these folders has 81 sub-folders, one for each time step spanning 000 to 240 in three-hour increments.

Then, those folders contain multiple days' worth of grib files for several different parameters at the parent folders' time step and initialisation time.

One day's worth of files in a step folder sums to 441439085 bytes which is a little under half a gig. Making the assumption the sizes do not vary significantly between times, steps, or days; multiplying this by 81 for each time step, then by two for each initialisation time, gives us the approximate daily size of ~70 Gb.

This can be verified via

$ cd /mnt/storage_c/GPDS/dd.weather.gc.ca/model_gem_global/15km/grib2/lat_lon
$ ls -alR | grep '20230709' | awk '{ sum += $5 } END{ print sum }'

Which prints the size of all the files in the various sub folders corresponding to the 9th July 2023 (takes a while to run!). This prints 66582081111, or 66Gb.

(The 8th July returns 66603571331, so they seem resonably constant)

jacobbieker commented 1 year ago

Thanks for all that! Yeah, makes sense to me, and I guess is just a bit surprising how little it is, but great!

jacobbieker commented 1 year ago

More possible sources, for observations if we wanted it: https://synopticdata.com/mesonet-api https://madis.ncep.noaa.gov/mesonet_providers.shtml

devsjc commented 11 months ago

ICON Implemented with https://github.com/openclimatefix/nwp-consumer/pull/61

devsjc commented 11 months ago

Huggingface implemented with https://github.com/openclimatefix/nwp-consumer/pull/49

devsjc commented 10 months ago

ICON updated with https://github.com/openclimatefix/nwp-consumer/pull/67

jacobbieker commented 9 months ago

Canada is implemented in #76

jacobbieker commented 9 months ago

GFS is implemented in #78

jacobbieker commented 9 months ago

Meteo-France Global and EU is in #80

jacobbieker commented 9 months ago

ERA5 is also now (mostly) available in Zarr from Google Cloud in WeatherBench 2 and arco-era5, so that shouldn't really need to be done.

jacobbieker commented 9 months ago

One other thing to start archiving might be the ICON ensemble predictions (EPS).

jacobbieker commented 9 months ago

There are new parameters available in ICON and ICON-EU which might be good to archive: https://www.dwd.de/DE/leistungen/opendata/neuigkeiten/opendata_november2023_2.html

jacobbieker commented 9 months ago

Also, ICON-ART, the aerosol forecast, will be available in the middle of the year