pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.59k stars 1.08k forks source link

`Dataset.to_array()` throws `IndexError` for empty datasets #7872

Open sehoffmann opened 1 year ago

sehoffmann commented 1 year ago

What happened?

>>> xr.__version__
'2023.4.2'
>>> xr.Dataset().to_array()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/mnt/qb/work2/goswami0/gkd021/conda/envs/wb/lib/python3.10/site-packages/xarray/core/dataset.py", line 6114, in to_array
    data = duck_array_ops.stack([b.data for b in broadcast_vars], axis=0)
  File "/mnt/qb/work2/goswami0/gkd021/conda/envs/wb/lib/python3.10/site-packages/xarray/core/duck_array_ops.py", line 326, in stack
    xp = get_array_namespace(arrays[0])
IndexError: list index out of range

What did you expect to happen?

The most reasonable way to handle this in my opinion would be to return an empty, i.e. default constructed, xr.DataArray:

>>> xr.DataArray()
<xarray.DataArray ()>
array(nan)

Minimal Complete Verifiable Example

No response

MVCE confirmation

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0] python-bits: 64 OS: Linux OS-release: 3.10.0-1160.76.1.el7.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.2 libnetcdf: 4.9.1 xarray: 2023.4.2 pandas: 1.5.2 numpy: 1.23.5 scipy: 1.9.3 netCDF4: 1.6.3 pydap: None h5netcdf: 1.1.0 h5py: 3.8.0 Nio: None zarr: 2.14.2 cftime: 1.6.2 nc_time_axis: None PseudoNetCDF: None iris: None bottleneck: 1.3.5 dask: 2022.7.0 distributed: 2022.7.0 matplotlib: 3.6.2 cartopy: 0.21.1 seaborn: None numbagg: None fsspec: 2022.11.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 65.6.3 pip: 22.3.1 conda: None pytest: 7.1.2 mypy: None IPython: 8.8.0 sphinx: 5.0.2
welcome[bot] commented 1 year ago

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

benbovy commented 1 year ago

Hmm not sure that it is something related to the explicit indexes refactor?

v2022.10.0 (after the refactor) raised a slightly more meaningful error message:

>>> xr.Dataset().to_array()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[5], line 1
----> 1 xarray.Dataset().to_array()

File ~/Git/github/benbovy/xarray/xarray/core/dataset.py:6079, in Dataset.to_array(self, dim, name)
   6077 data_vars = [self.variables[k] for k in self.data_vars]
   6078 broadcast_vars = broadcast_variables(*data_vars)
-> 6079 data = duck_array_ops.stack([b.data for b in broadcast_vars], axis=0)
   6081 dims = (dim,) + broadcast_vars[0].dims
   6082 variable = Variable(dims, data, self.attrs, fastpath=True)

File ~/Git/github/benbovy/xarray/xarray/core/duck_array_ops.py:287, in stack(arrays, axis)
    285 def stack(arrays, axis=0):
    286     """stack() with better dtype promotion rules."""
--> 287     return _stack(as_shared_dtype(arrays), axis=axis)

File ~/Git/github/benbovy/xarray/xarray/core/duck_array_ops.py:187, in as_shared_dtype(scalars_or_arrays)
    182     arrays = [asarray(x) for x in scalars_or_arrays]
    183 # Pass arrays directly instead of dtypes to result_type so scalars
    184 # get handled properly.
    185 # Note that result_type() safely gets the dtype from dask arrays without
    186 # evaluating them.
--> 187 out_type = dtypes.result_type(*arrays)
    188 return [x.astype(out_type, copy=False) for x in arrays]

File ~/Git/github/benbovy/xarray/xarray/core/dtypes.py:183, in result_type(*arrays_and_dtypes)
    178     if any(issubclass(t, left) for t in types) and any(
    179         issubclass(t, right) for t in types
    180     ):
    181         return np.dtype(object)
--> 183 return np.result_type(*arrays_and_dtypes)

File <__array_function__ internals>:200, in result_type(*args, **kwargs)

ValueError: at least one array or dtype is required