roocs / clisops

Climate Simulation Operations
https://clisops.readthedocs.io/en/latest/
Other
21 stars 9 forks source link

Adding tests to warrant ATLAS dataset support #319

Closed sol1105 closed 6 months ago

sol1105 commented 7 months ago

Pull Request Checklist:

What kind of change does this PR introduce?:

This PR adds tests to warrant support of ATLAS datasets. See also https://github.com/roocs/roocs-utils/pull/113 The environment changes are preliminary and will be reverted after the next roocs_utils release.

Does this PR introduce a breaking change?:

No.

Other information:

Zeitsperre commented 7 months ago

@sol1105 The new black release is causing some problems. I'll open a PR!

sol1105 commented 7 months ago

@Zeitsperre I think I got it, thanks :)

sol1105 commented 7 months ago

@cehbrecht @Zeitsperre It seems I cannot set the libnetcdf / netcdf-c version for the pypi tests? For some reason it installs 4.9.3-development which has a different behaviour with regards to the erroneous deflate options in the ATLAS datasets (#317). This version is not available for the conda tests (here 4.9.2 is getting installed). Do you have an idea how to go ahead?

Zeitsperre commented 7 months ago

@sol1105

libnetcdf is not a Python package; It's a system-level library, so you can only install it via apt, i.e. $ apt install libnetcdf-dev. Anaconda offers a handful of non-Python packages (C, C++, Rust, etc.)in its repositories, libnetcdf being one of them.

Zeitsperre commented 7 months ago

@sol1105

Is this a case of the system not having libnetcdf or zlib installed? In that case, you could add it to the job that does the testing by adding a new step:

- name: Install libnetcdf and zlib
  run: |
    apt-get update
    apt-get upgrade
    apt-get install libnetcdf-dev zlib1g-dev
sol1105 commented 7 months ago

@Zeitsperre Thanks for your reply. I do not know what the actual reason for the problem is. I added a few test datasets of the IPCC ATLAS datasets incl. tests. The problem is that the ATLAS CMIP5/6 and ATLAS CORDEX datasets include string variables with illegal filters. The netcdf-c library had some breaking changes from 4.9.0 onwards and likely again in 4.9.3 in how to deal with such illegal filters. My intention was to simply remove these filters for the affected variables.

In the conda tests, everything works as expected. In the pypi+tox tests however, the affected string variables suddenly have a different data type (eg. 'dtype': dtype('<U35') instead of 'dtype': <class 'str'>). And the encoding dicitionaries of these variables are empty, apart from the dtype specification. This is why the tests fail. I could not reproduce this locally even with libnetcdf 4.9.3-development and latest xarray installed, as in the pypi+tox tests. It looks to me as if in some library this filter problem was addressed in that certain way, but I cannot identify which library does it.

I guess I will just set up a fix for the cases in that the filters are part of the encoding dictionary and not anymore try to make sense of what happens in the background.

coveralls commented 7 months ago

Pull Request Test Coverage Report for Build 7832704313


Changes Missing Coverage Covered Lines Changed/Added Lines %
clisops/ops/base_operation.py 26 27 96.3%
<!-- Total: 26 27 96.3% -->
Totals Coverage Status
Change from base Build 7613866827: 0.3%
Covered Lines: 1759
Relevant Lines: 2422

💛 - Coveralls
sol1105 commented 7 months ago

@Zeitsperre I could now reproduce the issue locally after installing the latest xarray via pip and not from source. I set up two fixes that we require for the ATLAS datasets. Since the ATLAS datasets are provided with compression level 9, one of the fixes caps the compression level at 1 to reduce the write time of the requests (by factor 5 at least in some tests, eg. from 10min to 2min, while causing no significant changes in the output file size). I do not know if that cap can be considered a breaking change, I would say no. But in case you are not happy that these fixes become part of the general clisops, the fixes can also be part of rook.

If all is fine this can be merged once the connected roocs_utils PR is part of a release.

cehbrecht commented 6 months ago

@sol1105 do you like to update the requirements for roocs-utils >= 0.6.7 ?