pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.57k stars 1.07k forks source link

Bug when padding coordinates with NaNs #6431

Open TomNicholas opened 2 years ago

TomNicholas commented 2 years ago

What happened?

da = xr.DataArray(np.arange(9), dim='x')
da.pad({'x': (0, 1)}, 'constant', constant_values=np.NAN)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [12], in <cell line: 1>()
----> 1 da.pad({'x': 1}, 'constant', constant_values=np.NAN)

File ~/Documents/Work/Code/xarray/xarray/core/dataarray.py:4158, in DataArray.pad(self, pad_width, mode, stat_length, constant_values, end_values, reflect_type, **pad_width_kwargs)
   4000 def pad(
   4001     self,
   4002     pad_width: Mapping[Any, int | tuple[int, int]] | None = None,
   (...)
   4012     **pad_width_kwargs: Any,
   4013 ) -> DataArray:
   4014     """Pad this array along one or more dimensions.
   4015 
   4016     .. warning::
   (...)
   4156         z        (x) float64 nan 100.0 200.0 nan
   4157     """
-> 4158     ds = self._to_temp_dataset().pad(
   4159         pad_width=pad_width,
   4160         mode=mode,
   4161         stat_length=stat_length,
   4162         constant_values=constant_values,
   4163         end_values=end_values,
   4164         reflect_type=reflect_type,
   4165         **pad_width_kwargs,
   4166     )
   4167     return self._from_temp_dataset(ds)

File ~/Documents/Work/Code/xarray/xarray/core/dataset.py:7368, in Dataset.pad(self, pad_width, mode, stat_length, constant_values, end_values, reflect_type, **pad_width_kwargs)
   7366     variables[name] = var
   7367 elif name in self.data_vars:
-> 7368     variables[name] = var.pad(
   7369         pad_width=var_pad_width,
   7370         mode=mode,
   7371         stat_length=stat_length,
   7372         constant_values=constant_values,
   7373         end_values=end_values,
   7374         reflect_type=reflect_type,
   7375     )
   7376 else:
   7377     variables[name] = var.pad(
   7378         pad_width=var_pad_width,
   7379         mode=coord_pad_mode,
   7380         **coord_pad_options,  # type: ignore[arg-type]
   7381     )

File ~/Documents/Work/Code/xarray/xarray/core/variable.py:1360, in Variable.pad(self, pad_width, mode, stat_length, constant_values, end_values, reflect_type, **pad_width_kwargs)
   1357 if reflect_type is not None:
   1358     pad_option_kwargs["reflect_type"] = reflect_type  # type: ignore[assignment]
-> 1360 array = np.pad(  # type: ignore[call-overload]
   1361     self.data.astype(dtype, copy=False),
   1362     pad_width_by_index,
   1363     mode=mode,
   1364     **pad_option_kwargs,
   1365 )
   1367 return type(self)(self.dims, array)

File <__array_function__ internals>:5, in pad(*args, **kwargs)

File ~/miniconda3/envs/py39/lib/python3.9/site-packages/numpy/lib/arraypad.py:803, in pad(array, pad_width, mode, **kwargs)
    801     for axis, width_pair, value_pair in zip(axes, pad_width, values):
    802         roi = _view_roi(padded, original_area_slice, axis)
--> 803         _set_pad_area(roi, axis, width_pair, value_pair)
    805 elif mode == "empty":
    806     pass  # Do nothing as _pad_simple already returned the correct result

File ~/miniconda3/envs/py39/lib/python3.9/site-packages/numpy/lib/arraypad.py:147, in _set_pad_area(padded, axis, width_pair, value_pair)
    130 """
    131 Set empty-padded area in given dimension.
    132 
   (...)
    144     broadcastable to the shape of `arr`.
    145 """
    146 left_slice = _slice_at_axis(slice(None, width_pair[0]), axis)
--> 147 padded[left_slice] = value_pair[0]
    149 right_slice = _slice_at_axis(
    150     slice(padded.shape[axis] - width_pair[1], None), axis)
    151 padded[right_slice] = value_pair[1]

ValueError: cannot convert float NaN to integer

What did you expect to happen?

It should have successfully padded with a NaN, same as it does if you don't specify constant_values:

In [14]: da.pad({'x': (0, 1)}, 'constant')
Out[14]: 
<xarray.DataArray (x: 3)>
array([ 0.,  1., nan])
Dimensions without coordinates: x

Minimal Complete Verifiable Example

No response

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46) [GCC 9.4.0] python-bits: 64 OS: Linux OS-release: 5.11.0-7620-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: 1.12.1 libnetcdf: 4.8.1

xarray: 0.20.3.dev4+gdbc02d4e pandas: 1.4.0 numpy: 1.21.4 scipy: 1.7.3 netCDF4: 1.5.8 pydap: None h5netcdf: None h5py: None Nio: None zarr: 2.10.3 cftime: 1.5.1.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2022.01.1 distributed: 2022.01.1 matplotlib: None cartopy: None seaborn: None numbagg: None fsspec: 2022.01.0 cupy: None pint: None sparse: None setuptools: 59.6.0 pip: 21.3.1 conda: 4.11.0 pytest: 6.2.5 IPython: 8.2.0 sphinx: 4.4.0

TomNicholas commented 2 years ago

The problem appears to be caused by a bug with our dtypes module. In this line the current padding code assumes that this

import xarray.core.dtypes

In [20]: dtypes.NA is np.NAN
Out[20]: False

would evaluate to True.

husainridwan commented 1 year ago

@TomNicholas, I believe the pad() method does not consider any coordinates and only pads the data along the dimensions it contains. That's why the padding leads to a new data array that has the same dimension name as the original one but no coordinates.

We can set the coordinates explicitly using the coords attribute of the DataArray after padding. Check this example:

import numpy as np
import xarray as xr

da = xr.DataArray(np.arange(9), dim='x')
padded_da = da.pad({'x': (0, 1)}, 'constant')
padded_da.coords['x'] = np.arange(padded_da.shape[0])
print(padded_da)

<xarray.DataArray (x: 3)>
array([ 0.,  1., nan])
Coordinates:
  * x        (x) int64 0 1 2

Hopefully this helps!