Closed onion5376 closed 4 weeks ago
Hi @onion5376 !
Sadly, I was not able to reproduce the bug. Your code runs ok on my machine. I have:
xesmf 0.8.7
xarray 2023.8.0
numpy 1.24.4
netCDF4 1.6.4
h5netcdf 1.2.0
This is a wild guess, but if you can't update your environment, you could try opening the netCDF with another backend to see if the problem persists :
ds = xr.open_dataset("chla201601.nc", engine='h5netcdf')
Hi @aulemahal. (1)When excuting the last line of code(dr_out = Regrd(ds)), the above error occurs. Follow your instruction, ds = xr.open_dataset("chla201601.nc", engine='h5netcdf'), it get another error, as followings:
> ---------------------------------------------------------------------------
> OSError Traceback (most recent call last)
> Cell In[7], line 1
> ----> 1 dr_out = Regrd(ds)
>
> File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xesmf/frontend.py:548](http://localhost:8890/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xesmf/frontend.py#line=547), in BaseRegridder.__call__(self, indata, keep_attrs, skipna, na_thres, output_chunks)
> 540 return self.regrid_dataarray(
> 541 indata,
> 542 keep_attrs=keep_attrs,
> (...)
> 545 output_chunks=output_chunks,
> 546 )
> 547 elif isinstance(indata, xr.Dataset):
> --> 548 return self.regrid_dataset(
> 549 indata,
> 550 keep_attrs=keep_attrs,
> 551 skipna=skipna,
> 552 na_thres=na_thres,
> 553 output_chunks=output_chunks,
> 554 )
> 555 else:
> 556 raise TypeError('input must be numpy array, dask array, xarray DataArray or Dataset!')
>
> File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xesmf/frontend.py:687](http://localhost:8890/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xesmf/frontend.py#line=686), in BaseRegridder.regrid_dataset(self, ds_in, keep_attrs, skipna, na_thres, output_chunks)
> 680 non_regriddable = [
> 681 name
> 682 for name, data in ds_in.data_vars.items()
> 683 if not set(input_horiz_dims).issubset(data.dims)
> 684 ]
> 685 ds_in = ds_in.drop_vars(non_regriddable)
> --> 687 ds_out = xr.apply_ufunc(
> 688 self.regrid_array,
> 689 ds_in,
> 690 self.weights,
> 691 kwargs=kwargs,
> 692 input_core_dims=[input_horiz_dims, ('out_dim', 'in_dim')],
> 693 output_core_dims=[temp_horiz_dims],
> 694 dask='allowed',
> 695 keep_attrs=keep_attrs,
> 696 )
> 698 return self._format_xroutput(ds_out, temp_horiz_dims)
>
> File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/computation.py:1265](http://localhost:8890/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/computation.py#line=1264), in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, meta, dask_gufunc_kwargs, on_missing_core_dim, *args)
> 1263 # feed datasets apply_variable_ufunc through apply_dataset_vfunc
> 1264 elif any(is_dict_like(a) for a in args):
> -> 1265 return apply_dataset_vfunc(
> 1266 variables_vfunc,
> 1267 *args,
> 1268 signature=signature,
> 1269 join=join,
> 1270 exclude_dims=exclude_dims,
> 1271 dataset_join=dataset_join,
> 1272 fill_value=dataset_fill_value,
> 1273 keep_attrs=keep_attrs,
> 1274 on_missing_core_dim=on_missing_core_dim,
> 1275 )
> 1276 # feed DataArray apply_variable_ufunc through apply_dataarray_vfunc
> 1277 elif any(isinstance(a, DataArray) for a in args):
>
> File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/computation.py:536](http://localhost:8890/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/computation.py#line=535), in apply_dataset_vfunc(func, signature, join, dataset_join, fill_value, exclude_dims, keep_attrs, on_missing_core_dim, *args)
> 531 list_of_coords, list_of_indexes = build_output_coords_and_indexes(
> 532 args, signature, exclude_dims, combine_attrs=keep_attrs
> 533 )
> 534 args = tuple(getattr(arg, "data_vars", arg) for arg in args)
> --> 536 result_vars = apply_dict_of_variables_vfunc(
> 537 func,
> 538 *args,
> 539 signature=signature,
> 540 join=dataset_join,
> 541 fill_value=fill_value,
> 542 on_missing_core_dim=on_missing_core_dim,
> 543 )
> 545 out: Dataset | tuple[Dataset, ...]
> 546 if signature.num_outputs > 1:
>
> File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/computation.py:460](http://localhost:8890/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/computation.py#line=459), in apply_dict_of_variables_vfunc(func, signature, join, fill_value, on_missing_core_dim, *args)
> 458 core_dim_present = _check_core_dims(signature, variable_args, name)
> 459 if core_dim_present is True:
> --> 460 result_vars[name] = func(*variable_args)
> 461 else:
> 462 if on_missing_core_dim == "raise":
>
> File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/computation.py:742](http://localhost:8890/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/computation.py#line=741), in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, vectorize, keep_attrs, dask_gufunc_kwargs, *args)
> 735 broadcast_dims = tuple(
> 736 dim for dim in dim_sizes if dim not in signature.all_core_dims
> 737 )
> 738 output_dims = [broadcast_dims + out for out in signature.output_core_dims]
> 740 input_data = [
> 741 (
> --> 742 broadcast_compat_data(arg, broadcast_dims, core_dims)
> 743 if isinstance(arg, Variable)
> 744 else arg
> 745 )
> 746 for arg, core_dims in zip(args, signature.input_core_dims, strict=True)
> 747 ]
> 749 if any(is_chunked_array(array) for array in input_data):
> 750 if dask == "forbidden":
>
> File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/computation.py:663](http://localhost:8890/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/computation.py#line=662), in broadcast_compat_data(variable, broadcast_dims, core_dims)
> 658 def broadcast_compat_data(
> 659 variable: Variable,
> 660 broadcast_dims: tuple[Hashable, ...],
> 661 core_dims: tuple[Hashable, ...],
> 662 ) -> Any:
> --> 663 data = variable.data
> 665 old_dims = variable.dims
> 666 new_dims = broadcast_dims + core_dims
>
> File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/variable.py:451](http://localhost:8890/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/variable.py#line=450), in Variable.data(self)
> 449 return self._data
> 450 elif isinstance(self._data, indexing.ExplicitlyIndexed):
> --> 451 return self._data.get_duck_array()
> 452 else:
> 453 return self.values
>
> File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py:837](http://localhost:8890/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py#line=836), in MemoryCachedArray.get_duck_array(self)
> 836 def get_duck_array(self):
> --> 837 self._ensure_cached()
> 838 return self.array.get_duck_array()
>
> File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py:831](http://localhost:8890/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py#line=830), in MemoryCachedArray._ensure_cached(self)
> 830 def _ensure_cached(self):
> --> 831 self.array = as_indexable(self.array.get_duck_array())
>
> File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py:788](http://localhost:8890/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py#line=787), in CopyOnWriteArray.get_duck_array(self)
> 787 def get_duck_array(self):
> --> 788 return self.array.get_duck_array()
>
> File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py:651](http://localhost:8890/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py#line=650), in LazilyIndexedArray.get_duck_array(self)
> 647 array = apply_indexer(self.array, self.key)
> 648 else:
> 649 # If the array is not an ExplicitlyIndexedNDArrayMixin,
> 650 # it may wrap a BackendArray so use its __getitem__
> --> 651 array = self.array[self.key]
> 653 # self.array[self.key] is now a numpy array when
> 654 # self.array is a BackendArray subclass
> 655 # and self.key is BasicIndexer((slice(None, None, None),))
> 656 # so we need the explicit check for ExplicitlyIndexed
> 657 if isinstance(array, ExplicitlyIndexed):
>
> File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/backends/h5netcdf_.py:51](http://localhost:8890/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/backends/h5netcdf_.py#line=50), in H5NetCDFArrayWrapper.__getitem__(self, key)
> 50 def __getitem__(self, key):
> ---> 51 return indexing.explicit_indexing_adapter(
> 52 key, self.shape, indexing.IndexingSupport.OUTER_1VECTOR, self._getitem
> 53 )
>
> File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py:1015](http://localhost:8890/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py#line=1014), in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method)
> 993 """Support explicit indexing by delegating to a raw indexing method.
> 994
> 995 Outer and[/or](http://localhost:8890/or) vectorized indexers are supported by indexing a second time
> (...)
> 1012 Indexing result, in the form of a duck numpy-array.
> 1013 """
> 1014 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support)
> -> 1015 result = raw_indexing_method(raw_key.tuple)
> 1016 if numpy_indices.tuple:
> 1017 # index the loaded np.ndarray
> 1018 indexable = NumpyIndexingAdapter(result)
>
> File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/backends/h5netcdf_.py:58](http://localhost:8890/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/backends/h5netcdf_.py#line=57), in H5NetCDFArrayWrapper._getitem(self, key)
> 56 with self.datastore.lock:
> 57 array = self.get_array(needs_lock=False)
> ---> 58 return array[key]
>
> File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/h5netcdf/core.py:555](http://localhost:8890/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/h5netcdf/core.py#line=554), in BaseVariable.__getitem__(self, key)
> 553 return h5ds[key].view(view)
> 554 else:
> --> 555 return h5ds[key]
>
> File h5py[/_objects.pyx:54](http://localhost:8890/_objects.pyx#line=53), in h5py._objects.with_phil.wrapper()
>
> File h5py[/_objects.pyx:55](http://localhost:8890/_objects.pyx#line=54), in h5py._objects.with_phil.wrapper()
>
> File [/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/h5py/_hl/dataset.py:758](http://localhost:8890/usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/h5py/_hl/dataset.py#line=757), in Dataset.__getitem__(self, args, new_dtype)
> 756 if self._fast_read_ok and (new_dtype is None):
> 757 try:
> --> 758 return self._fast_reader.read(args)
> 759 except TypeError:
> 760 pass # Fall back to Python read pathway below
>
> File h5py[/_selector.pyx:376](http://localhost:8890/_selector.pyx#line=375), in h5py._selector.Reader.read()
>
> OSError: [Errno 14] Can't synchronously read data (file read failed: time = Wed Oct 9 23:26:41 2024
> , filename = '[/media/sf_F_DRIVE/try1/201601.nc](http://localhost:8890/lab/tree/201601.nc)', file descriptor = 59, errno = 14, error message = 'Bad address', buf = 0x55cb69267410, total read size = 24641280, bytes this sub-read = 24641280, bytes actually read = 18446744073709551615, offset = 0)
(2)I don't get this error when I try to replace the data with another smaller one(just 2 days, small_data.zip)to run my script . The above trial data is a dataset for one month. So it seems to me that this error has nothing to do with the engine parameter in the open_dataset function, but has to do with the size of the data volume.
Main package information:
xesmf 0.8.7 xarray 2024.9.0 numpy 2.0.2 netCDF4 1.7.1 h5netcdf 1.4.0 python 3.12.7
I've been stuck with this question for days, Could you give some advice。
I'm really sorry, I just created an environment with the versions you gave in the precedent comment, and it passes. I still can't reproduce the issue with the code and file given in the top comment.
The errors you have look like the file is corrupted. If you simply do ds.load()
before any xesmf calls, does the error also happen ?
As for the size, I wouldn't think a memory problem you give this error. And running only your snippet on the test data on my machine used a maximum of 600 Mo of RAM. I would be surprised that this is too much for your machine.
Thanks@aulemahal. (1) The test script: import matplotlib.pyplot as plt import cartopy.crs as ccrs import numpy as np import xarray as xr import xesmf as xe ds = xr.open_dataset("chla201601.nc") ds.load()
RuntimeError Traceback (most recent call last) Cell In[5], line 1 ----> 1 ds.load()
File /usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/dataset.py:880, in Dataset.load(self, **kwargs) 878 for k, v in self.variables.items(): 879 if k not in lazy_data: --> 880 v.load() 882 return self
File /usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/variable.py:981, in Variable.load(self, kwargs) 964 def load(self, kwargs): 965 """Manually trigger loading of this variable's data from disk or a 966 remote source into memory and return this variable. 967 (...) 979 dask.array.compute 980 """ --> 981 self._data = to_duck_array(self._data, **kwargs) 982 return self
File /usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/namedarray/pycompat.py:134, in to_duck_array(data, **kwargs) 131 return loaded_data 133 if isinstance(data, ExplicitlyIndexed): --> 134 return data.get_duck_array() # type: ignore[no-untyped-call, no-any-return] 135 elif is_duck_array(data): 136 return data
File /usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py:837, in MemoryCachedArray.get_duck_array(self) 836 def get_duck_array(self): --> 837 self._ensure_cached() 838 return self.array.get_duck_array()
File /usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py:831, in MemoryCachedArray._ensure_cached(self) 830 def _ensure_cached(self): --> 831 self.array = as_indexable(self.array.get_duck_array())
File /usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py:788, in CopyOnWriteArray.get_duck_array(self) 787 def get_duck_array(self): --> 788 return self.array.get_duck_array()
File /usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py:651, in LazilyIndexedArray.get_duck_array(self) 647 array = apply_indexer(self.array, self.key) 648 else: 649 # If the array is not an ExplicitlyIndexedNDArrayMixin, 650 # it may wrap a BackendArray so use its getitem --> 651 array = self.array[self.key] 653 # self.array[self.key] is now a numpy array when 654 # self.array is a BackendArray subclass 655 # and self.key is BasicIndexer((slice(None, None, None),)) 656 # so we need the explicit check for ExplicitlyIndexed 657 if isinstance(array, ExplicitlyIndexed):
File /usr/miniforge3/envs/xesmfenv/lib/python3.12/site-packages/xarray/backends/netCDF4.py:100, in NetCDF4ArrayWrapper.getitem(self, key) 99 def getitem(self, key): --> 100 return indexing.explicit_indexing_adapter( 101 key, self.shape, indexing.IndexingSupport.OUTER, self._getitem 102 )
File /usr/miniforge3/envs/xesmf_env/lib/python3.12/site-packages/xarray/core/indexing.py:1015, in explicit_indexing_adapter(key, shape, indexing_support, raw_indexing_method) 993 """Support explicit indexing by delegating to a raw indexing method. 994 995 Outer and/or vectorized indexers are supported by indexing a second time (...) 1012 Indexing result, in the form of a duck numpy-array. 1013 """ 1014 raw_key, numpy_indices = decompose_indexer(key, shape, indexing_support) -> 1015 result = raw_indexing_method(raw_key.tuple) 1016 if numpy_indices.tuple: 1017 # index the loaded np.ndarray 1018 indexable = NumpyIndexingAdapter(result)
File /usr/miniforge3/envs/xesmfenv/lib/python3.12/site-packages/xarray/backends/netCDF4.py:113, in NetCDF4ArrayWrapper._getitem(self, key) 111 with self.datastore.lock: 112 original_array = self.get_array(needs_lock=False) --> 113 array = getitem(original_array, key) 114 except IndexError: 115 # Catch IndexError in netCDF4 and return a more informative 116 # error message. This is most often called when an unsorted 117 # indexer is used before the data is loaded from disk. 118 msg = ( 119 "The indexing operation you are attempting to perform " 120 "is not valid on netCDF4.Variable object. Try loading " 121 "your data into memory first by calling .load()." 122 )
File src/netCDF4/_netCDF4.pyx:4981, in netCDF4._netCDF4.Variable.getitem()
File src/netCDF4/_netCDF4.pyx:5953, in netCDF4._netCDF4.Variable._get()
File src/netCDF4/_netCDF4.pyx:2113, in netCDF4._netCDF4._ensure_nc_success()
RuntimeError: NetCDF: HDF error
(2) The memory of the virtual system should be sufficient for this case data processing, the ram information was shown here:
(xesmf_env) [root@localhost onion5376]# free -h total used free shared buff/cache available Mem: 8.3Gi 5.0Gi 2.0Gi 44Mi 1.7Gi 3.3Gi Swap: 3.9Gi 0B 3.9Gi
(3) Regarding whether the file is corrupted or not, I did a simple plot of this data. However everything is fine for all variables (CHL, CHL_uncertainty and flags). The code:
import xarray as xr
import matplotlib.pyplot as plt
ds = xr.open_dataset("chla201601.nc")
g = xr.plot.FacetGrid(ds, col='time', col_wrap=3)
g.map(plt.pcolormesh, 'longitude', 'latitude', 'CHL',vmin=0.000001,vmax=8)
plt.tight_layout()
plt.show()
If ds.load()
crashes, but not the plot of CHL
, I'm pretty sure that means the corrupted or unreadable part is elsewhere. Do you need to keep CHL_uncertainty
and flags
?
Try to change the last line of the regridding script to this to regrid only CHL and forget about the other two variables.
dr_out = Regrd(ds[['CHL']]) # Regrid only CHL
Also, I think your last comment clearly shows that the error does not come from xESMF. I'll close this issue for now. I suggest you contact people that provided you with this dataset and show them the error when you do ds.load()
, maybe including your package list (pip list
in a terminal).
Finally, simply for your information : you can use github text formatting to make code show as monospaced text. For python code, for example you can put this on the line before your code : ```python3
and the same : ```, on the line after.
Thanks aulemahal. I have tested regridding scrip for only CHL, it gets the same errors. I'll think of some more solutions.
That's weird! I don't get how you can plot the entire variable, but not load it... Good luck!
I have install xesmf in a new, clean environment under centos 9 , based on the tech doc (https://xesmf.readthedocs.io/en/latest/installation.html). This is the data used in script(chla201601.zip)
The error is shown as following:
chla201601.zip