Closed mlangguth89 closed 1 month ago
The predictor variables for downscaling (e.g. CAPE, convective precipitation, etc.) are not used in AtmoRep and therefore required downloading the data with the CDS API. For convenience, the data has been downloaded in netCDF-format and is available under /p/scratch/atmo-rep/data/era5/new_structure/
. The load_era5_monthly
-method is however designed to handle both.
Closer inspection of the script and methods under the dsrnngan/data
-directory reveals several issues, e.g.
write_data
np.digitize
misses an index-shift, the rainy grid points draction is calculated on the full-domain here (see above) Training data has been successfully preprocessed in the last weeks, which should (hopefully) enable training:
/p/scratch/atmo-rep/data/downscaling/downscaling_tfrecords/training_data/0aad51a8f3848213
Integration of the validation dataset is still open, and will probably be realized via TFRecords again for efficiency (with no patching of the data). However, the corresponding adaptations will be performed in a seperate issue-branch.
Add functions to the data preprocessing (that is used to create the TFRecords files that are streamed during training) to read/process the ERA5 input data and the CERRA target data. As both datasets are avialable in monthly grib-files, the data processing will also be changed. So far, an iterator is used to write the dataset into TFrecords, where each sample involves an I/O-process (i.e. opening the file, getting the data, closing it). this produces a lot of I/O-overhead that can be avoided with the monthyl files, and thus, the related
write_data
-function as well theDataGenerator
will be adapted accordingly.