pik-copan / pyunicorn

Unified Complex Network and Recurrence Analysis Toolbox
http://pik-potsdam.de/~donges/pyunicorn/
Other
200 stars 87 forks source link

Printing `ClimateData` object throws `KeyError` from `h5netcdf` #210

Closed fkuehlein closed 9 months ago

fkuehlein commented 11 months ago

Discovered this when running the tutorial_ClimateNetworks.ipynb notebook, see full output below. Probably not a Problem with pyunicorn itself, right? Could someone confirm this happening to make sure its not the result of some corrupted conda environment I'm using?


When running the cell where the ClimateData is loaded,

#  Print some information on the data set
print(data)

will return

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /opt/miniconda3/envs/pyunicorn/lib/python3.10/site-packages/h5netcdf/legacyapi.py:67, in HasAttributesMixin.__getattr__(self, name)
     66 try:
---> 67     return self.attrs[name]
     68 except KeyError:

File /opt/miniconda3/envs/pyunicorn/lib/python3.10/site-packages/h5netcdf/attrs.py:32, in Attributes.__getitem__(self, key)
     31 if self._h5py.__name__ == \"h5py\":
---> 32     attr = self._h5attrs.get_id(key)
     33 else:

File /opt/miniconda3/envs/pyunicorn/lib/python3.10/site-packages/h5py/_hl/attrs.py:94, in AttributeManager.get_id(self, name)
     92 \"\"\"Get a low-level AttrID object for the named attribute.
     93 \"\"\"
---> 94 return h5a.open(self._id, self._e(name))

File h5py/_objects.pyx:54, in h5py._objects.with_phil.wrapper()

File h5py/_objects.pyx:55, in h5py._objects.with_phil.wrapper()

File h5py/h5a.pyx:80, in h5py.h5a.open()

KeyError: \"Can't open attribute (can't locate attribute in name index)\"

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
Cell In[34], line 7
      1 data = climate.ClimateData.Load(
      2     file_name=DATA_FILENAME, observable_name=OBSERVABLE_NAME,
      3     data_source=DATA_SOURCE, file_type=FILE_TYPE,
      4     window=WINDOW, time_cycle=TIME_CYCLE)
      6 #  Print some information on the data set
----> 7 print(data)

File ~/Desktop/23_H2_PIK/pyunicorn/src/pyunicorn/climate/climate_data.py:103, in ClimateData.__str__(self)
     99 def __str__(self):
    100     \"\"\"
    101     Returns a string representation.
    102     \"\"\"
--> 103     return 'ClimateData:\
' + Data.__str__(self)

File pyunicorn/src/pyunicorn/core/data.py:113, in Data.__str__(self)
    111 \"\"\"Return a string representation of the object.\"\"\"
    112 if self.file_name:
--> 113     self.print_data_info()
    115 return (f\"Data: {self.grid.N} grid points, \"
    116         f\"{self.grid.n_grid_points} measurements.\
\"
    117         f\"Geographical boundaries:\
{self.grid.print_boundaries()}\")

File pyunicorn/src/pyunicorn/core/data.py:390, in Data.print_data_info(self)
    388 # Open netCDF4 file
    389 f = Dataset(self.file_name, \"r\")
--> 390 print(\"File format:\", f.file_format)
    391 print(\"Global attributes:\")
    392 for name in f.ncattrs():

File /opt/miniconda3/envs/pyunicorn/lib/python3.10/site-packages/h5netcdf/legacyapi.py:69, in HasAttributesMixin.__getattr__(self, name)
     67     return self.attrs[name]
     68 except KeyError:
---> 69     raise AttributeError(
     70         f\"NetCDF: attribute {type(self).__name__}:{name} not found\"
     71     )

AttributeError: NetCDF: attribute Dataset:file_format not found"
ntfrgl commented 11 months ago

This specific error goes back to cd8ee00 in the context of #160, which made an untested assumption that netCDF4.Dataset and h5netcdf.legacyapi.Dataset are sufficiently compatible. Of course, in fact the former has more legacy functionality, such as the Dataset.file_format variable, which in this case is used only to print out basic metadata.

The broader question is about which API version should be supported going forward. Let's move that discussion, which has been awaiting input from @jdonges, from #12 to here. The easiest solution would be to switch the strict dependency to netCDF4, which had been an optional dependency before the commit above. Alternatively, we could stick with h5netcdf and remove or replace the legacy API usage. The primary advantage of h5netcdf is a smaller installation footprint, because it relies on fewer C libraries.

fkuehlein commented 11 months ago

Alright, I will then make sure to avoid running into this in the tutorial for now.