Open devsjc opened 1 year ago
@jacobbieker going to estimate storage costs here. Please correct me if I've got anything wrong!
We are planning on using Icon and ECMWF, everything else is a nice to have.
Type | Source | Scope | Regularity | Local size / day |
---|---|---|---|---|
Weather | Icon | EU | Daily to huggingface | 0 |
Global | Daily to huggingface? | 0 | ||
Meteo France | EU | Daily | 24Gb | |
Global | Daily | 18Gb | ||
France | Every 3 days | 35Gb | ||
France HD | Every 3 days | 9.9Gb | ||
Canada | Global | Every Day | 74Gb | |
Aerosol | SILAM | Global | Yes | 1.3Gb |
This is approximately ~130Gb per day.
Leonardo's storage_c
has 36Tb available which gives us ~270 days before we run out of space.
Working for Canada Global:
Two folders in /mnt/storage_c/GPDS/dd.weather.gc.ca/model_gem_global/15km/grib2/lat_lon
, one for each initialisation time in HH format:
00 12
Each of these folders has 81 sub-folders, one for each time step spanning 000
to 240
in three-hour increments.
Then, those folders contain multiple days' worth of grib files for several different parameters at the parent folders' time step and initialisation time.
One day's worth of files in a step folder sums to 441439085 bytes
which is a little under half a gig. Making the assumption the sizes do not vary significantly between times, steps, or days; multiplying this by 81 for each time step, then by two for each initialisation time, gives us the approximate daily size of ~70 Gb.
This can be verified via
$ cd /mnt/storage_c/GPDS/dd.weather.gc.ca/model_gem_global/15km/grib2/lat_lon
$ ls -alR | grep '20230709' | awk '{ sum += $5 } END{ print sum }'
Which prints the size of all the files in the various sub folders corresponding to the 9th July 2023 (takes a while to run!). This prints 66582081111, or 66Gb.
(The 8th July returns 66603571331, so they seem resonably constant)
Thanks for all that! Yeah, makes sense to me, and I guess is just a bit surprising how little it is, but great!
More possible sources, for observations if we wanted it: https://synopticdata.com/mesonet-api https://madis.ncep.noaa.gov/mesonet_providers.shtml
ICON Implemented with https://github.com/openclimatefix/nwp-consumer/pull/61
Huggingface implemented with https://github.com/openclimatefix/nwp-consumer/pull/49
ICON updated with https://github.com/openclimatefix/nwp-consumer/pull/67
Canada is implemented in #76
GFS is implemented in #78
Meteo-France Global and EU is in #80
ERA5 is also now (mostly) available in Zarr from Google Cloud in WeatherBench 2 and arco-era5, so that shouldn't really need to be done.
One other thing to start archiving might be the ICON ensemble predictions (EPS).
There are new parameters available in ICON and ICON-EU which might be good to archive: https://www.dwd.de/DE/leistungen/opendata/neuigkeiten/opendata_november2023_2.html
Also, ICON-ART, the aerosol forecast, will be available in the middle of the year
Implement new sources:
WEATHER FORECASTS
Implement new sinks:
Sources with stars do not have archives, so would have to be run as continuous downloads. Do Icon first.