pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.65k stars 1.09k forks source link

ScipyArrayWrapper' object has no attribute 'oindex' #8909

Open ocraft opened 8 months ago

ocraft commented 8 months ago

What happened?

Exception ScipyArrayWrapper' object has no attribute 'oindex' when trying to save dataset into netcdf file after selecting subset from dataset previously loaded from another netcdf file.

What did you expect to happen?

No response

Minimal Complete Verifiable Example

import xarray as xr

ds = xr.Dataset()
ds['A'] = xr.DataArray([[1, 'a'], [2,'b']],dims=['x','y'])
ds.to_netcdf('test.nc')
ds2 = xr.open_dataset('test.nc')
ds2.sel(y=[1]).to_netcdf('test.nc')

MVCE confirmation

Relevant log output

File ~/Workspace/phd/.venv/lib/python3.10/site-packages/xarray/core/indexing.py:342, in IndexCallable.__getitem__(self, key)
    341 def __getitem__(self, key: Any) -> Any:
--> 342     return self.getter(key)

File ~/Workspace/phd/.venv/lib/python3.10/site-packages/xarray/coding/variables.py:72, in _ElementwiseFunctionArray._oindex_get(self, key)
     71 def _oindex_get(self, key):
---> 72     return type(self)(self.array.oindex[key], self.func, self.dtype)

File ~/Workspace/phd/.venv/lib/python3.10/site-packages/xarray/core/indexing.py:342, in IndexCallable.__getitem__(self, key)
    341 def __getitem__(self, key: Any) -> Any:
--> 342     return self.getter(key)

File ~/Workspace/phd/.venv/lib/python3.10/site-packages/xarray/coding/strings.py:256, in StackedBytesArray._oindex_get(self, key)
    255 def _oindex_get(self, key):
--> 256     return _numpy_char_to_bytes(self.array.oindex[key])

AttributeError: 'ScipyArrayWrapper' object has no attribute 'oindex'

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.8 | packaged by conda-forge | (main, Nov 22 2022, 08:26:04) [GCC 10.4.0] python-bits: 64 OS: Linux OS-release: 6.5.0-26-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.14.2 libnetcdf: None xarray: 2024.3.0 pandas: 2.2.1 numpy: 1.26.4 scipy: 1.13.0 netCDF4: None pydap: None h5netcdf: None h5py: 3.10.0 Nio: None zarr: None cftime: None nc_time_axis: None iris: None bottleneck: 1.3.8 dask: 2024.4.0 distributed: 2024.4.0 matplotlib: 3.8.4 cartopy: None seaborn: 0.13.2 numbagg: None fsspec: 2024.3.1 cupy: None pint: None sparse: None flox: 0.9.6 numpy_groupies: 0.10.2 setuptools: 63.2.0 pip: 24.0 conda: None pytest: 8.1.1 mypy: None IPython: 8.23.0 sphinx: None
welcome[bot] commented 8 months ago

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

ocraft commented 8 months ago

It's important to note, that the example works on xarray==2024.2.0, the problem exists in 2024.3.0.

dcherian commented 8 months ago

Yes sorry about that.

@andersy005 we should probably roll back the changes to coding/*.py and bundle them in the backends feature branch

dcherian commented 8 months ago

FWIW I can't reproduce even when forcing it to write a netcdf3 file with engine="scipy"

andersy005 commented 7 months ago

i was able to reproduce the issue from a fresh environment:

mamba create -n test 'python=3.12' xarray scipy ipython distributed 
In [6]: ds2 = xr.open_dataset('/tmp/test.nc')

In [7]: ds2.sel(y=[1])
Out[7]: 
<xarray.Dataset> Size: 16B
Dimensions:  (x: 2, y: 1)
Dimensions without coordinates: x, y
Data variables:
    A        (x, y) object 16B ...

In [8]: ds2.sel(y=[1]).to_netcdf('/tmp/ttest.nc')
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[8], line 1
----> 1 ds2.sel(y=[1]).to_netcdf('/tmp/ttest.nc')

File ~/mambaforge/envs/test/lib/python3.12/site-packages/xarray/core/dataset.py:2298, in Dataset.to_netcdf(self, path, mode, format, group, engine, encoding, unlimited_dims, compute, invalid_netcdf)
   2295     encoding = {}
   2296 from xarray.backends.api import to_netcdf
-> 2298 return to_netcdf(  # type: ignore  # mypy cannot resolve the overloads:(
   2299     self,
   2300     path,
   2301     mode=mode,
   2302     format=format,
   2303     group=group,
   2304     engine=engine,
   2305     encoding=encoding,
   2306     unlimited_dims=unlimited_dims,
   2307     compute=compute,
   2308     multifile=False,
   2309     invalid_netcdf=invalid_netcdf,
   2310 )

File ~/mambaforge/envs/test/lib/python3.12/site-packages/xarray/backends/api.py:1339, in to_netcdf(dataset, path_or_file, mode, format, group, engine, encoding, unlimited_dims, compute, multifile, invalid_netcdf)
   1334 # TODO: figure out how to refactor this logic (here and in save_mfdataset)
   1335 # to avoid this mess of conditionals
   1336 try:
   1337     # TODO: allow this work (setting up the file for writing array data)
   1338     # to be parallelized with dask
-> 1339     dump_to_store(
   1340         dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
   1341     )
   1342     if autoclose:
   1343         store.close()

File ~/mambaforge/envs/test/lib/python3.12/site-packages/xarray/backends/api.py:1386, in dump_to_store(dataset, store, writer, encoder, encoding, unlimited_dims)
   1383 if encoder:
   1384     variables, attrs = encoder(variables, attrs)
-> 1386 store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)

File ~/mambaforge/envs/test/lib/python3.12/site-packages/xarray/backends/common.py:393, in AbstractWritableDataStore.store(self, variables, attributes, check_encoding_set, writer, unlimited_dims)
    390 if writer is None:
    391     writer = ArrayWriter()
--> 393 variables, attributes = self.encode(variables, attributes)
    395 self.set_attributes(attributes)
    396 self.set_dimensions(variables, unlimited_dims=unlimited_dims)

File ~/mambaforge/envs/test/lib/python3.12/site-packages/xarray/backends/common.py:482, in WritableCFDataStore.encode(self, variables, attributes)
    479 def encode(self, variables, attributes):
    480     # All NetCDF files get CF encoded by default, without this attempting
    481     # to write times, for example, would fail.
--> 482     variables, attributes = cf_encoder(variables, attributes)
    483     variables = {k: self.encode_variable(v) for k, v in variables.items()}
    484     attributes = {k: self.encode_attribute(v) for k, v in attributes.items()}

File ~/mambaforge/envs/test/lib/python3.12/site-packages/xarray/conventions.py:795, in cf_encoder(variables, attributes)
    792 # add encoding for time bounds variables if present.
    793 _update_bounds_encoding(variables)
--> 795 new_vars = {k: encode_cf_variable(v, name=k) for k, v in variables.items()}
    797 # Remove attrs from bounds variables (issue #2921)
    798 for var in new_vars.values():

File ~/mambaforge/envs/test/lib/python3.12/site-packages/xarray/conventions.py:196, in encode_cf_variable(var, needs_copy, name)
    183 ensure_not_multiindex(var, name=name)
    185 for coder in [
    186     times.CFDatetimeCoder(),
    187     times.CFTimedeltaCoder(),
   (...)
    194     variables.BooleanCoder(),
    195 ]:
--> 196     var = coder.encode(var, name=name)
    198 # TODO(kmuehlbauer): check if ensure_dtype_not_object can be moved to backends:
    199 var = ensure_dtype_not_object(var, name=name)

File ~/mambaforge/envs/test/lib/python3.12/site-packages/xarray/coding/times.py:972, in CFDatetimeCoder.encode(self, variable, name)
    970 def encode(self, variable: Variable, name: T_Name = None) -> Variable:
    971     if np.issubdtype(
--> 972         variable.data.dtype, np.datetime64
    973     ) or contains_cftime_datetimes(variable):
    974         dims, data, attrs, encoding = unpack_for_encoding(variable)
    976         units = encoding.pop("units", None)

File ~/mambaforge/envs/test/lib/python3.12/site-packages/xarray/core/variable.py:433, in Variable.data(self)
    431     return self._data
    432 elif isinstance(self._data, indexing.ExplicitlyIndexed):
--> 433     return self._data.get_duck_array()
    434 else:
    435     return self.values

File ~/mambaforge/envs/test/lib/python3.12/site-packages/xarray/core/indexing.py:809, in MemoryCachedArray.get_duck_array(self)
    808 def get_duck_array(self):
--> 809     self._ensure_cached()
    810     return self.array.get_duck_array()

File ~/mambaforge/envs/test/lib/python3.12/site-packages/xarray/core/indexing.py:803, in MemoryCachedArray._ensure_cached(self)
    802 def _ensure_cached(self):
--> 803     self.array = as_indexable(self.array.get_duck_array())

File ~/mambaforge/envs/test/lib/python3.12/site-packages/xarray/core/indexing.py:760, in CopyOnWriteArray.get_duck_array(self)
    759 def get_duck_array(self):
--> 760     return self.array.get_duck_array()

File ~/mambaforge/envs/test/lib/python3.12/site-packages/xarray/core/indexing.py:619, in LazilyIndexedArray.get_duck_array(self)
    617 def get_duck_array(self):
    618     if isinstance(self.array, ExplicitlyIndexedNDArrayMixin):
--> 619         array = apply_indexer(self.array, self.key)
    620     else:
    621         # If the array is not an ExplicitlyIndexedNDArrayMixin,
    622         # it may wrap a BackendArray so use its __getitem__
    623         array = self.array[self.key]

File ~/mambaforge/envs/test/lib/python3.12/site-packages/xarray/core/indexing.py:1000, in apply_indexer(indexable, indexer)
    998     return indexable.vindex[indexer]
    999 elif isinstance(indexer, OuterIndexer):
-> 1000     return indexable.oindex[indexer]
   1001 else:
   1002     return indexable[indexer]

File ~/mambaforge/envs/test/lib/python3.12/site-packages/xarray/core/indexing.py:342, in IndexCallable.__getitem__(self, key)
    341 def __getitem__(self, key: Any) -> Any:
--> 342     return self.getter(key)

File ~/mambaforge/envs/test/lib/python3.12/site-packages/xarray/coding/variables.py:72, in _ElementwiseFunctionArray._oindex_get(self, key)
     71 def _oindex_get(self, key):
---> 72     return type(self)(self.array.oindex[key], self.func, self.dtype)

File ~/mambaforge/envs/test/lib/python3.12/site-packages/xarray/core/indexing.py:342, in IndexCallable.__getitem__(self, key)
    341 def __getitem__(self, key: Any) -> Any:
--> 342     return self.getter(key)

File ~/mambaforge/envs/test/lib/python3.12/site-packages/xarray/coding/strings.py:256, in StackedBytesArray._oindex_get(self, key)
    255 def _oindex_get(self, key):
--> 256     return _numpy_char_to_bytes(self.array.oindex[key])

AttributeError: 'ScipyArrayWrapper' object has no attribute 'oindex'

In [9]: xr.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:54:21) [Clang 16.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.4.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None

xarray: 2024.3.0
pandas: 2.2.1
numpy: 1.26.4
scipy: 1.13.0
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: 2024.4.1
distributed: 2024.4.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.3.1
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 69.2.0
pip: 24.0
conda: None
pytest: None
mypy: None
IPython: 8.22.2
sphinx: None
FiND-Tao commented 6 months ago

I tried "pip install xarray==0.20.1 scipy==1.7.1" and it removed the error.

hunterboerner commented 4 months ago

Having the same error over here.

dcherian commented 4 months ago

In the meantime using installing netCDF4 should make things work. I'll work on fixing this.

gmaze commented 1 month ago

Note that I can reproduce this error even with scipy 1.14.1 and netCDF4 1.7.1 in the environment: https://github.com/euroargodev/argopy/issues/390