mlangguth89 / downscaling_benchmark

6 stars 0 forks source link

Corrupted wind downscaling dataset #28

Closed mlangguth89 closed 1 month ago

mlangguth89 commented 1 month ago

The normalization parameters deduced in #23 and currently available under the following path: /p/scratch/deepacf/maelstrom/maelstrom_data/ap5/downscaling_benchmark_dataset/benchmark_wind/dataset/norm.json reveals issue with the preprocessed wind downscaling dataset. In particular, the mean of the ws100m_tar data is negative (-19.50 m/s), whereas blh_in (6710.69 m) and ws100m_in (8931.78 m/s) exhibit unrealistically large values. Thus, it is assumed that at least parts of the data files are corrupted. The dataset must therefore be double-checked and corrected.

mlangguth89 commented 1 month ago

It is found that the files contain NAN for some month for which missing data in the COSMO REA6 is known. This affects (at least) the files for the following months: 1996-05: 48 timesteps (between 1996-05-04 01 UTC (step: 73) and 1996-05-06 00 UTC (step: 120)) 2005-10: 1 timestep (2005-10-13 19 UTC (step: 307)) 2005-11: 1 timestep (2005-11-23 00 UTC (step: 528)) Note that the missing data point for 2005-10 is different from the missing data point for the T2m-dataset! Further investigaion is required.

mlangguth89 commented 1 month ago

It is furthermore found that the raw ERA5-data files after 1997 are corrupted (located here: /p/scratch/deepacf/maelstrom/maelstrom_data/ap5/downscaling_benchmark_dataset/rawdata/era5/wind/orig). This can readily be seen from the attributes of the originally obtained datafiles where the add_offset- and scale_factor-attributes are the same for the variables blh, msl , u100 and v100: grafik Thus, the raw data must be downloaded again.

mlangguth89 commented 1 month ago

Complete list of missing data points in COSMO REA6 data:

mlangguth89 commented 1 month ago

Newly created dataset seems to be fine - await results by @epavel1