pangeo-data / WeatherBench

A benchmark dataset for data-driven weather forecasting
MIT License
694 stars 166 forks source link

CMIP missing features #32

Closed berniwal closed 3 years ago

berniwal commented 3 years ago

Thank you very much for providing such a nice dataset!

I am currently working with it and wondered how you exactly did the pretraining on CMIP data for "Data-driven medium-range weather prediction with a Resnet pretrained on climate simulations: A new model for WeatherBench".

As here the CMIP data is clearly missing some features which are present in the ERA5 dataset, I wondered how you extracted them from the data in CMIP or replaced them if not present. Especially for the target variables (T850, Z500, T2M), I assume you extracted them from the dataset, however for the T2M I'm unsure if pressure level 1000 is the correct choice.

From your code it seems you have chosen to replace missing variables by all ones, take the constants from ERA5, however for the target variables I did not figure out how / where you extracted them.

Are there other things to consider when pretraining on CMIP or is it otherwise equivalent to the ERA5 training?

Thank you very much for your time!

raspstephan commented 3 years ago

Hi, thanks for looking into this :)

You can see my download and preprocessing scripts in my working fork of this directory: https://github.com/raspstephan/WeatherBench/tree/master/snakemake_configs/MPI-ESM

I specifically chose a CMIP run that had all the variables I needed, MPI-ESM. T2M for example is present in the CMIP dataset.

The CMIP runs have 6h output intervals, while the ERA data has 1h. This is not a problem but one has to be careful if using several input time steps. And the CMIP data only comes on 7 levels, so I just chose those.

Hopefully this answers your question. Let me know if not!

mjwillson commented 3 years ago

Hi @raspstephan , I was also confused by this, as these extra fields are not present in the regridded CMIP dataset you released at https://mediatum.ub.tum.de/1524895 nor in the list of CMIP fields under 'Download the data' in your README.

There I only see the following fields:

The following are missing, although you appear to have SnakeMake configs to download and regrid them in your forked repository here: https://github.com/raspstephan/WeatherBench/tree/master/snakemake_configs/MPI-ESM:

Would it be possible to add your regridded versions of these fields to the released dataset?

Also do you have a version of total_cloud_cover anywhere for CMIP, and if not what did you use instead (all ones?)

Thanks!

mjwillson commented 3 years ago

Noting that the source data for precipitation_flux and toa_incident_solar_radiation no longer appears to be available, see #39 . I'm hoping you'll be able to release your regridded versions of these fields as it seems we have no other way to access them?