Error caused by differences in ERA5 Data Between CDS and CDS-Beta

linjonathan / tropical_cyclone_risk

A Physics-Based, Tropical Cyclone Downscaling Model

MIT License

26 stars 11 forks source link

Error caused by differences in ERA5 Data Between CDS and CDS-Beta #7

Open yurgaohku opened 2 months ago

yurgaohku commented 2 months ago

Hi Lin,

The ERA5 CDS dataset is scheduled to be replaced by the CDS-Beta dataset starting from 26 September 2024. More information can be found here: (https://forum.ecmwf.int/t/differences-in-era5-pressure-level-data-between-cds-and-cds-beta/5014). When I attempt to use the new CDS-Beta data to run the run.py file, I encountered the following error:

Saving model output to /home/yurgao/tc/tropical_cyclone_risk-main/data/era5/test/ Generating land masks... Computing monthly mean and variance of environmental wind... Traceback (most recent call last): File "/home/yurgao/.conda/envs/tc_risk/lib/python3.12/site-packages/xarray/core/dataset.py", line 1393, in _construct_dataarray variable = self._variables[name]


KeyError: 'time'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/yurgao/tc/tropical_cyclone_risk-main/run.py", line 15, in <module>
    compute.compute_downscaling_inputs()
  File "/home/yurgao/tc/tropical_cyclone_risk-main/util/compute.py", line 27, in compute_downscaling_inputs
    env_wind.gen_wind_mean_cov()
  File "/home/yurgao/tc/tropical_cyclone_risk-main/track/env_wind.py", line 97, in gen_wind_mean_cov
    out = dask.compute(*lazy_results)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yurgao/.conda/envs/tc_risk/lib/python3.12/site-packages/dask/base.py", line 664, in compute
    results = schedule(dsk, keys, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yurgao/tc/tropical_cyclone_risk-main/track/env_wind.py", line 130, in wnd_stat_wrapper
    dts = input.convert_to_datetime(ds_ua, ds_ua['time'].values)
                                           ~~~~~^^^^^^^^
  File "/home/yurgao/.conda/envs/tc_risk/lib/python3.12/site-packages/xarray/core/dataset.py", line 1484, in __getitem__
    return self._construct_dataarray(key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yurgao/.conda/envs/tc_risk/lib/python3.12/site-packages/xarray/core/dataset.py", line 1395, in _construct_dataarray
    _, name, variable = _get_virtual_variable(self._variables, name, self.dims)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/yurgao/.conda/envs/tc_risk/lib/python3.12/site-packages/xarray/core/dataset.py", line 196, in _get_virtual_variable
    raise KeyError(key)
KeyError: 'time'

It appears that the issue may be due to differences in data output between the old CDS and the CDS-Beta. For more details, you might find this report helpful: (https://forum.ecmwf.int/t/differences-in-era5-pressure-level-data-between-cds-and-cds-beta/5014). Would it be possible for you to make adjustments to the existing model code to better align with the new dataset?

Best, Gao

linjonathan commented 2 months ago

Hi Gao,

Yes, this is an issue since the conversion from .grib to .netcdf seems to have changed on the ERA5 server. I will be adding support for .grib files in the near future.

Jonathan

XinghanLiu2002 commented 1 week ago

Hi @linjonathan,

Thanks for your open-source model. I am experiencing the same problem and would like to know if you have resolved this issue.

Best wishes, Xinghan

linjonathan commented 6 days ago

@XinghanLiu2002 @yurgaohku @alexandrefierro

Please see this pull request for code changes. If there are no issues, I will merge it. Note, you will need to re-download the ERA5 data in the GRIB format, using the updated download_era5.py script.

XinghanLiu2002 commented 5 days ago

Thank you jonathan! When I downloaded the ERA5 data by updated download_era5.py , though the target data were successfully downloaded to the data/era5 file, it still had a strange output as following.

[Forum announcement](https://forum.ecmwf.int/t/final-validated-era5-product-to-differ-from-era5t-in-july-2024/6685)
for details and watch it for further updates on this.
2024-11-28 10:07:16,018 INFO Request ID is f1e2f3e0-604d-49d5-a7a1-e8d670e5641e
2024-11-28 10:07:16,305 INFO status has been updated to accepted
2024-11-28 10:07:25,390 INFO status has been updated to successful
Error downloading the data...

Then I ran python3 run.py WP (because I only need TCs in WP), the error occurred as following.

Saving model output to /work/home/xinghan/tc/tc_risk/data/era5/test/
Generating land masks...
Preprocessing .grib files...
Traceback (most recent call last):
  File "/work/home/xinghan/tc/tc_risk/run.py", line 15, in <module>
    compute.compute_downscaling_inputs()
  File "/work/home/xinghan/tc/tc_risk/util/compute.py", line 27, in compute_downscaling_inputs
    input.preprocess_grib()
  File "/work/home/xinghan/tc/tc_risk/util/input.py", line 23, in preprocess_grib
    ds = xr.open_dataset(fn)
         ^^^^^^^^^^^^^^^^^^^
  File "/work/home/xinghan/miniconda3/envs/tc_risk/lib/python3.11/site-packages/xarray/backends/api.py", line 547, in open_dataset
    engine = plugins.guess_engine(filename_or_obj)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work/home/xinghan/miniconda3/envs/tc_risk/lib/python3.11/site-packages/xarray/backends/plugins.py", line 197, in guess_engine
    raise ValueError(error_msg)
ValueError: did not find a match in any of xarray's currently installed IO backends ['netcdf4', 'scipy']. Consider explicitly selecting one of the installed engines via the ``engine`` parameter, or installing additional IO dependencies, see:
https://docs.xarray.dev/en/stable/getting-started-guide/installing.html
https://docs.xarray.dev/en/stable/user-guide/io.html

It seems that there are somethings wrong in python packages. And then I checked the version of the two packages.

netcdf4                   1.6.2           py311h0e679e6_0    defaults
scipy                     1.14.1          py311h08b1b3b_0    defaults

Whether we need to modify the environment.yml after we updated the scripts as you mentioned.

linjonathan commented 4 days ago

Doesn't your output say "Error downloading the data..."?

XinghanLiu2002 commented 4 days ago

Yes, but the data did get downloaded.
`(base) [xinghan@login03 era5]$ find -type f -name "*.grib"

./2021/era5_t_monthly_2021.grib ./2021/era5_q_monthly_2021.grib ./2021/era5_v_daily_2021.grib ./2021/era5_sst_monthly_2021.grib ./2021/era5_sp_monthly_2021.grib ./2021/era5_u_daily_2021.grib ./2016/era5_sst_monthly_2016.grib ./2016/era5_u_daily_2016.grib ./2016/era5_sp_monthly_2016.grib ./2016/era5_t_monthly_2016.grib ./2016/era5_q_monthly_2016.grib ./2016/era5_v_daily_2016.grib ./2019/era5_sst_monthly_2019.grib ./2019/era5_sp_monthly_2019.grib ./2019/era5_t_monthly_2019.grib ./2019/era5_q_monthly_2019.grib ./2019/era5_u_daily_2019.grib ./2019/era5_v_daily_2019.grib ./2018/era5_sst_monthly_2018.grib ./2018/era5_sp_monthly_2018.grib ./2018/era5_u_daily_2018.grib ./2018/era5_t_monthly_2018.grib ./2018/era5_q_monthly_2018.grib ./2018/era5_v_daily_2018.grib ./2020/era5_t_monthly_2020.grib ./2020/era5_q_monthly_2020.grib ./2020/era5_u_daily_2020.grib ./2020/era5_sst_monthly_2020.grib ./2020/era5_sp_monthly_2020.grib ./2020/era5_v_daily_2020.grib ./2017/era5_t_monthly_2017.grib ./2017/era5_q_monthly_2017.grib ./2017/era5_u_daily_2017.grib ./2017/era5_sst_monthly_2017.grib ./2017/era5_sp_monthly_2017.grib ./2017/era5_v_daily_2017.grib`

ERA5 official website shows that the download has been completed so I feel strange about Error downloading the data.... Yet, I download all the data locally and upload them to the cluster to run python3 run.py WP next. Either way, I still end up with the above error.

alexandrefierro commented 4 days ago

@XinghanLiu2002 @yurgaohku @alexandrefierro

Please see this pull request for code changes. If there are no issues, I will merge it. Note, you will need to re-download the ERA5 data in the GRIB format, using the updated download_era5.py script.

Hi Jonathan:

I was lucky enough to retrieve all the data I needed prior to this nomenclature change from ECMWF. So no issues on my end. A naive idea but would it be simpler to just change the array name valid_time to time using NETCDF tools such as CDO or NCO?

NCO: ncrename -v valid_time,time input.nc output.nc CDO: cdo chname,valid_time,time input.nc output.nc

Cheers,

linjonathan commented 3 days ago

Yes, but the data did get downloaded. `(base) [xinghan@login03 era5]$ find -type f -name "*.grib"

./2021/era5_t_monthly_2021.grib ./2021/era5_q_monthly_2021.grib ./2021/era5_v_daily_2021.grib ./2021/era5_sst_monthly_2021.grib ./2021/era5_sp_monthly_2021.grib ./2021/era5_u_daily_2021.grib ./2016/era5_sst_monthly_2016.grib ./2016/era5_u_daily_2016.grib ./2016/era5_sp_monthly_2016.grib ./2016/era5_t_monthly_2016.grib ./2016/era5_q_monthly_2016.grib ./2016/era5_v_daily_2016.grib ./2019/era5_sst_monthly_2019.grib ./2019/era5_sp_monthly_2019.grib ./2019/era5_t_monthly_2019.grib ./2019/era5_q_monthly_2019.grib ./2019/era5_u_daily_2019.grib ./2019/era5_v_daily_2019.grib ./2018/era5_sst_monthly_2018.grib ./2018/era5_sp_monthly_2018.grib ./2018/era5_u_daily_2018.grib ./2018/era5_t_monthly_2018.grib ./2018/era5_q_monthly_2018.grib ./2018/era5_v_daily_2018.grib ./2020/era5_t_monthly_2020.grib ./2020/era5_q_monthly_2020.grib ./2020/era5_u_daily_2020.grib ./2020/era5_sst_monthly_2020.grib ./2020/era5_sp_monthly_2020.grib ./2020/era5_v_daily_2020.grib ./2017/era5_t_monthly_2017.grib ./2017/era5_q_monthly_2017.grib ./2017/era5_u_daily_2017.grib ./2017/era5_sst_monthly_2017.grib ./2017/era5_sp_monthly_2017.grib ./2017/era5_v_daily_2017.grib`

ERA5 official website shows that the download has been completed so I feel strange about Error downloading the data.... Yet, I download all the data locally and upload them to the cluster to run python3 run.py WP next. Either way, I still end up with the above error.

Ah, yes you probably need to install cfgrib. I can make those changes to the environment.yml file. You can do this quickly by doing:

conda install -n ENV_NAME eccodes cfgrib

You can check commits d5cfcf8 ee17415 for changes to the conda environment.

linjonathan commented 3 days ago

@XinghanLiu2002 @yurgaohku @alexandrefierro Please see this pull request for code changes. If there are no issues, I will merge it. Note, you will need to re-download the ERA5 data in the GRIB format, using the updated download_era5.py script.

Hi Jonathan:

I was lucky enough to retrieve all the data I needed prior to this nomenclature change from ECMWF. So no issues on my end. A naive idea but would it be simpler to just change the array name valid_time to time using NETCDF tools such as CDO or NCO?

NCO: ncrename -v valid_time,time input.nc output.nc CDO: cdo chname,valid_time,time input.nc output.nc

Cheers,

This isn't actually an issue with the code; it is that the grib to netcdf conversion script on the CDS server changes the time variable. I think grib is the better format anyway, which is why I am adding grib support rather than writing a workaround for this issue. Most climate models still use netcdf output however.

XinghanLiu2002 commented 2 days ago

Yes, but the data did get downloaded. (base) [xinghan@login03 era5]$ find -type f -name "*.grib" ./2021/era5_t_monthly_2021.grib ./2021/era5_q_monthly_2021.grib ./2021/era5_v_daily_2021.grib ./2021/era5_sst_monthly_2021.grib ./2021/era5_sp_monthly_2021.grib ./2021/era5_u_daily_2021.grib ./2016/era5_sst_monthly_2016.grib ./2016/era5_u_daily_2016.grib ./2016/era5_sp_monthly_2016.grib ./2016/era5_t_monthly_2016.grib ./2016/era5_q_monthly_2016.grib ./2016/era5_v_daily_2016.grib ./2019/era5_sst_monthly_2019.grib ./2019/era5_sp_monthly_2019.grib ./2019/era5_t_monthly_2019.grib ./2019/era5_q_monthly_2019.grib ./2019/era5_u_daily_2019.grib ./2019/era5_v_daily_2019.grib ./2018/era5_sst_monthly_2018.grib ./2018/era5_sp_monthly_2018.grib ./2018/era5_u_daily_2018.grib ./2018/era5_t_monthly_2018.grib ./2018/era5_q_monthly_2018.grib ./2018/era5_v_daily_2018.grib ./2020/era5_t_monthly_2020.grib ./2020/era5_q_monthly_2020.grib ./2020/era5_u_daily_2020.grib ./2020/era5_sst_monthly_2020.grib ./2020/era5_sp_monthly_2020.grib ./2020/era5_v_daily_2020.grib ./2017/era5_t_monthly_2017.grib ./2017/era5_q_monthly_2017.grib ./2017/era5_u_daily_2017.grib ./2017/era5_sst_monthly_2017.grib ./2017/era5_sp_monthly_2017.grib ./2017/era5_v_daily_2017.grib ERA5 official website shows that the download has been completed so I feel strange about Error downloading the data.... Yet, I download all the data locally and upload them to the cluster to run python3 run.py WP next. Either way, I still end up with the above error.

Ah, yes you probably need to install cfgrib. I can make those changes to the environment.yml file. You can do this quickly by doing:

conda install -n ENV_NAME eccodes cfgrib

You can check commits d5cfcf8 ee17415 for changes to the conda environment.

(base) D:\tc\tropical_cyclone_risk-grib_update>conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:
  - eccodes
  - cfgrib

I think environment.yml should add the following information for eccodes and cfgrib.

channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.8
  - pip
  - pip:
    - eccodes
    - cfgrib

alexandrefierro commented 1 day ago

@XinghanLiu2002 @yurgaohku @alexandrefierro Please see this pull request for code changes. If there are no issues, I will merge it. Note, you will need to re-download the ERA5 data in the GRIB format, using the updated download_era5.py script.

Hi Jonathan: I was lucky enough to retrieve all the data I needed prior to this nomenclature change from ECMWF. So no issues on my end. A naive idea but would it be simpler to just change the array name valid_time to time using NETCDF tools such as CDO or NCO? NCO: ncrename -v valid_time,time input.nc output.nc CDO: cdo chname,valid_time,time input.nc output.nc Cheers,

This isn't actually an issue with the code; it is that the grib to netcdf conversion script on the CDS server changes the time variable. I think grib is the better format anyway, which is why I am adding grib support rather than writing a workaround for this issue. Most climate models still use netcdf output however.

Hi Jonathan:

Do you know if this issue would also affect the CHAZ model from Chia-Ying Lee at Columbia Uni? I emailed them a few days ago but did not get a concrete response about that yet.

Could you also please elaborate on how exactly the time array changed with the CDS grib-->nc conversion through a quick example to see if this could be fixed with a simple NCO or CDO command? Thank you!