mlangguth89 / downscaling-cgan

Clean, easier to use version of the downscaling cGAN
MIT License
0 stars 0 forks source link

Readers for CERRA, ERA5 and IMERG data #2

Closed mlangguth89 closed 1 month ago

mlangguth89 commented 3 months ago

Add functions to the data preprocessing (that is used to create the TFRecords files that are streamed during training) to read/process the ERA5 input data and the CERRA target data. As both datasets are avialable in monthly grib-files, the data processing will also be changed. So far, an iterator is used to write the dataset into TFrecords, where each sample involves an I/O-process (i.e. opening the file, getting the data, closing it). this produces a lot of I/O-overhead that can be avoided with the monthyl files, and thus, the related write_data-function as well the DataGenerator will be adapted accordingly.

mlangguth89 commented 2 months ago

The predictor variables for downscaling (e.g. CAPE, convective precipitation, etc.) are not used in AtmoRep and therefore required downloading the data with the CDS API. For convenience, the data has been downloaded in netCDF-format and is available under /p/scratch/atmo-rep/data/era5/new_structure/. The load_era5_monthly-method is however designed to handle both.

mlangguth89 commented 2 months ago

Closer inspection of the script and methods under the dsrnngan/data-directory reveals several issues, e.g.

mlangguth89 commented 1 month ago

Training data has been successfully preprocessed in the last weeks, which should (hopefully) enable training:

/p/scratch/atmo-rep/data/downscaling/downscaling_tfrecords/training_data/0aad51a8f3848213

Integration of the validation dataset is still open, and will probably be realized via TFRecords again for efficiency (with no patching of the data). However, the corresponding adaptations will be performed in a seperate issue-branch.