pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.59k stars 1.08k forks source link

xarray can open a nc file with open_dataset, but fails to load this nc file with load #9608

Open onion5376 opened 3 hours ago

onion5376 commented 3 hours ago

What is your issue?

Recently, I have downloaded chla data from copernicus marine service, and tried to regrid it with xarray. The sad thing is that the data always goes wrong in the load phase. I have checked that variables in test dataset could be plotted normally. I do know what happen to this. Any advice is appreciated. The test code:

import xarray as xr
ds = xr.open_dataset("chla201601.nc")
ds.load()

Test dataset: chla201601.zip

Error information:

Details

```python-traceback --------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) Cell In[3], line 1 ----> 1 ds.load() File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/dataset.py:880](http://localhost:8888/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/dataset.py#line=879), in Dataset.load(self, **kwargs) 878 for k, v in self.variables.items(): 879 if k not in lazy_data: --> 880 v.load() 882 return self File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/variable.py:981](http://localhost:8888/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/variable.py#line=980), in Variable.load(self, **kwargs) 964 def load(self, **kwargs): 965 """Manually trigger loading of this variable's data from disk or a 966 remote source into memory and return this variable. 967 (...) 979 dask.array.compute 980 """ --> 981 self._data = to_duck_array(self._data, **kwargs) 982 return self File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/namedarray/pycompat.py:134](http://localhost:8888/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/namedarray/pycompat.py#line=133), in to_duck_array(data, **kwargs) 131 return loaded_data 133 if isinstance(data, ExplicitlyIndexed): --> 134 return data.get_duck_array() # type: ignore[no-untyped-call, no-any-return] 135 elif is_duck_array(data): 136 return data File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py:837](http://localhost:8888/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py#line=836), in MemoryCachedArray.get_duck_array(self) 836 def get_duck_array(self): --> 837 self._ensure_cached() 838 return self.array.get_duck_array() File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py:831](http://localhost:8888/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py#line=830), in MemoryCachedArray._ensure_cached(self) 830 def _ensure_cached(self): --> 831 self.array = as_indexable(self.array.get_duck_array()) File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py:788](http://localhost:8888/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py#line=787), in CopyOnWriteArray.get_duck_array(self) 787 def get_duck_array(self): --> 788 return self.array.get_duck_array() File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py:651](http://localhost:8888/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py#line=650), in LazilyIndexedArray.get_duck_array(self) 647 array = apply_indexer(self.array, self.key) 648 else: 649 # If the array is not an ExplicitlyIndexedNDArrayMixin, 650 # it may wrap a BackendArray so use its __getitem__ --> 651 array = self.array[self.key] 653 # self.array[self.key] is now a numpy array when 654 # self.array is a BackendArray subclass 655 # and self.key is BasicIndexer((slice(None, None, None),)) 656 # so we need the explicit check for ExplicitlyIndexed 657 if isinstance(array, ExplicitlyIndexed): File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:100](http://localhost:8888/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/backends/netCDF4_.py#line=99), in NetCDF4ArrayWrapper.__getitem__(self, key) 99 def __getitem__(self, key): --> 100 return indexing.explicit_indexing_adapter( 101 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem 102 ) File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py:1015](http://localhost:8888/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py#line=1014), in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method) 993 """Support explicit indexing by delegating to a raw indexing method. 994 995 Outer and[/or](http://localhost:8888/or) vectorized indexers are supported by indexing a second time (...) 1012 Indexing result, in the form of a duck numpy-array. 1013 """ 1014 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support) -> 1015 result = raw_indexing_method(raw_key.tuple) 1016 if numpy_indices.tuple: 1017 # index the loaded np.ndarray 1018 indexable = NumpyIndexingAdapter(result) File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/backends/netCDF4_.py:113](http://localhost:8888/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/backends/netCDF4_.py#line=112), in NetCDF4ArrayWrapper._getitem(self, key) 111 with self.datastore.lock: 112 original_array = self.get_array(needs_lock=False) --> 113 array = getitem(original_array, key) 114 except IndexError: 115 # Catch IndexError in netCDF4 and return a more informative 116 # error message. This is most often called when an unsorted 117 # indexer is used before the data is loaded from disk. 118 msg = ( 119 "The indexing operation you are attempting to perform " 120 "is not valid on netCDF4.Variable object. Try loading " 121 "your data into memory first by calling .load()." 122 ) File src[/netCDF4/_netCDF4.pyx:4981](http://localhost:8888/netCDF4/_netCDF4.pyx#line=4980), in netCDF4._netCDF4.Variable.__getitem__() File src[/netCDF4/_netCDF4.pyx:5953](http://localhost:8888/netCDF4/_netCDF4.pyx#line=5952), in netCDF4._netCDF4.Variable._get() File src[/netCDF4/_netCDF4.pyx:2113](http://localhost:8888/netCDF4/_netCDF4.pyx#line=2112), in netCDF4._netCDF4._ensure_nc_success() RuntimeError: NetCDF: HDF error ```

Main package information:

xarray 2024.9.0 numpy 2.0.2 netCDF 4 1.7.1 h5netcdf 1.4.0 python 3.12.7

The ram information: total used free shared buff/cache available Mem: 8.3Gi 2.1Gi 5.3Gi 45Mi 1.2Gi 6.2Gi Swap: 3.9Gi 0B 3.9Gi

welcome[bot] commented 3 hours ago

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!