Closed mlangguth89 closed 3 months ago
It is found that the files contain NAN for some month for which missing data in the COSMO REA6 is known. This affects (at least) the files for the following months: 1996-05: 48 timesteps (between 1996-05-04 01 UTC (step: 73) and 1996-05-06 00 UTC (step: 120)) 2005-10: 1 timestep (2005-10-13 19 UTC (step: 307)) 2005-11: 1 timestep (2005-11-23 00 UTC (step: 528)) Note that the missing data point for 2005-10 is different from the missing data point for the T2m-dataset! Further investigaion is required.
It is furthermore found that the raw ERA5-data files after 1997 are corrupted (located here: /p/scratch/deepacf/maelstrom/maelstrom_data/ap5/downscaling_benchmark_dataset/rawdata/era5/wind/orig
). This can readily be seen from the attributes of the originally obtained datafiles where the add_offset
- and scale_factor
-attributes are the same for the variables blh
, msl
, u100
and v100
:
Thus, the raw data must be downloaded again.
Complete list of missing data points in COSMO REA6 data:
Newly created dataset seems to be fine - await results by @epavel1
The normalization parameters deduced in #23 and currently available under the following path:
/p/scratch/deepacf/maelstrom/maelstrom_data/ap5/downscaling_benchmark_dataset/benchmark_wind/dataset/norm.json
reveals issue with the preprocessed wind downscaling dataset. In particular, the mean of thews100m_tar
data is negative (-19.50 m/s), whereasblh_in
(6710.69 m) andws100m_in
(8931.78 m/s) exhibit unrealistically large values. Thus, it is assumed that at least parts of the data files are corrupted. The dataset must therefore be double-checked and corrected.