pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

Pandas Integer Type Doesn't Convert in Dataset #9742

Open edwardreed81 opened 2 weeks ago

edwardreed81 commented 2 weeks ago

What happened?

Converted a Pandas DataFrame containing a column of type pandas.Int64Dtype() into an Xarray Dataset. The data variable doesn't get converted to an Xarray compatible type:

Data variables:
    0        (dim_0) Int64 27B <class 'xarray.core.extension_array.PandasExte...

Additionally, this causes an exception if the Dataset is pickled and subsequently loaded:

RecursionError: maximum recursion depth exceeded

What did you expect to happen?

The data variable ends up as int64 type. Pickling the Dataset works properly.

Minimal Complete Verifiable Example

import pandas as pd
import xarray as xr
import pickle

df = pd.DataFrame([1, 2, 3], dtype=pd.Int64Dtype())
ds = xr.Dataset(df)
dsdump = pickle.dumps(ds)
pickle.loads(dsdump)

MVCE confirmation

Relevant log output

---------------------------------------------------------------------------
RecursionError                            Traceback (most recent call last)
Cell In[1], line 8
      6 ds = xr.Dataset(df)
      7 dsdump = pickle.dumps(ds)
----> 8 pickle.loads(dsdump)

File ~/metis-dev/.venv/lib/python3.12/site-packages/xarray/core/extension_array.py:112, in PandasExtensionArray.__getattr__(self, attr)
    111 def __getattr__(self, attr: str) -> object:
--> 112     return getattr(self.array, attr)

File ~/metis-dev/.venv/lib/python3.12/site-packages/xarray/core/extension_array.py:112, in PandasExtensionArray.__getattr__(self, attr)
    111 def __getattr__(self, attr: str) -> object:
--> 112     return getattr(self.array, attr)

    [... skipping similar frames: PandasExtensionArray.__getattr__ at line 112 (2974 times)]

File ~/metis-dev/.venv/lib/python3.12/site-packages/xarray/core/extension_array.py:112, in PandasExtensionArray.__getattr__(self, attr)
    111 def __getattr__(self, attr: str) -> object:
--> 112     return getattr(self.array, attr)

RecursionError: maximum recursion depth exceeded

Anything else we need to know?

Xarray 2024.9.0 does not exhibit this behavior.

Environment

INSTALLED VERSIONS ------------------ commit: None python: 3.12.7 (main, Oct 1 2024, 11:15:50) [GCC 14.2.1 20240910] python-bits: 64 OS: Linux OS-release: 6.6.32-1-lts machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: ('en_US', 'UTF-8') libhdf5: None libnetcdf: None xarray: 2024.10.0 pandas: 2.2.3 numpy: 1.26.4 scipy: 1.14.1 netCDF4: None pydap: None h5netcdf: None h5py: None zarr: 2.18.3 cftime: None nc_time_axis: None iris: None bottleneck: 1.4.2 dask: 2024.10.0 distributed: None matplotlib: 3.9.2 cartopy: None seaborn: None numbagg: None fsspec: 2024.10.0 cupy: None pint: None sparse: None flox: None numpy_groupies: None setuptools: 75.3.0 pip: 23.3.1 conda: None pytest: None mypy: None IPython: 8.29.0 sphinx: None
welcome[bot] commented 2 weeks ago

Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!

max-sixty commented 2 weeks ago

Confirmed as a bug