Open shoyer opened 2 hours ago
Here's the error message from pandas's TestDataFrameToXArray.test_to_xarray_index_types[string]
:
AssertionError: Attributes of DataFrame.iloc[:, 5] (column name="f") are differentAttribute "dtype" are different[left]: CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=False, categories_dtype=object)[right]: objectself = <pandas.tests.generic.test_to_xarray.TestDataFrameToXArray object at 0x13d4fa7cbe90>index_flat = Index(['pandas_0', 'pandas_1', 'pandas_2', 'pandas_3', 'pandas_4', 'pandas_5', 'pandas_6', 'pandas_7', 'pandas_...pandas_93', 'pandas_94', 'pandas_95', 'pandas_96', 'pandas_97', 'pandas_98', 'pandas_99'], dtype='object')df = bar a b c d e f g hfoo ....0 True c 2013-01-03 2013-01-03 00:00:00-05:00pandas_3 d 4 6 7.0 False d 2013-01-04 2013-01-04 00:00:00-05:00using_infer_string = False def test_to_xarray_index_types(self, index_flat, df, using_infer_string): index = index_flat # MultiIndex is tested in test_to_xarray_with_multiindex if len(index) == 0: pytest.skip("Test doesn't make sense for empty index") from xarray import Dataset df.index = index[:4] df.index.name = "foo" df.columns.name = "bar" result = df.to_xarray() assert result.sizes["foo"] == 4 assert len(result.coords) == 1 assert len(result.data_vars) == 8 tm.assert_almost_equal(list(result.coords.keys()), ["foo"]) assert isinstance(result, Dataset) # idempotency # datetimes w/tz are preserved # column names are lost expected = df.copy() expected["f"] = expected["f"].astype( object if not using_infer_string else "string[pyarrow_numpy]" ) expected.columns.name = None> tm.assert_frame_equal(result.to_dataframe(), expected)E AssertionError: Attributes of DataFrame.iloc[:, 5] (column name="f") are differentE E Attribute "dtype" are differentE [left]: CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=False, categories_dtype=object)E [right]: objecttests/generic/test_to_xarray.py:58: AssertionError
Failed
<br class="Apple-interchange-newline">AssertionError: Attributes of DataFrame.iloc[:, 5] (column name="f") are different
Attribute "dtype" are different
[left]: CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=False, categories_dtype=object)
[right]: object
self = <pandas.tests.generic.test_to_xarray.TestDataFrameToXArray object at 0x13d4fa7cbe90>
index_flat = Index(['pandas_0', 'pandas_1', 'pandas_2', 'pandas_3', 'pandas_4', 'pandas_5',
'pandas_6', 'pandas_7', 'pandas_...pandas_93', 'pandas_94', 'pandas_95',
'pandas_96', 'pandas_97', 'pandas_98', 'pandas_99'],
dtype='object')
df = bar a b c d e f g h
foo ....0 True c 2013-01-03 2013-01-03 00:00:00-05:00
pandas_3 d 4 6 7.0 False d 2013-01-04 2013-01-04 00:00:00-05:00
using_infer_string = False
def test_to_xarray_index_types(self, index_flat, df, using_infer_string):
index = index_flat
# MultiIndex is tested in test_to_xarray_with_multiindex
if len(index) == 0:
pytest.skip("Test doesn't make sense for empty index")
from xarray import Dataset
df.index = index[:4]
[df.index.name](https://www.google.com/url?q=http://df.index.name&sa=D) = "foo"
[df.columns.name](https://www.google.com/url?q=http://df.columns.name&sa=D) = "bar"
result = df.to_xarray()
assert result.sizes["foo"] == 4
assert len(result.coords) == 1
assert len(result.data_vars) == 8
tm.assert_almost_equal(list(result.coords.keys()), ["foo"])
assert isinstance(result, Dataset)
# idempotency
# datetimes w/tz are preserved
# column names are lost
expected = df.copy()
expected["f"] = expected["f"].astype(
object if not using_infer_string else "string[pyarrow_numpy]"
)
[expected.columns.name](https://www.google.com/url?q=http://expected.columns.name&sa=D) = None
> tm.assert_frame_equal(result.to_dataframe(), expected)
E AssertionError: Attributes of DataFrame.iloc[:, 5] (column name="f") are different
E
E Attribute "dtype" are different
E [left]: CategoricalDtype(categories=['a', 'b', 'c', 'd'], ordered=False, categories_dtype=object)
E [right]: object
tests/generic/test_to_xarray.py:58: AssertionError
cc @ilan-gold
It appears that #9520 may have broken some upstream pandas tests, specifically testing round-trips with various index types: https://github.com/pandas-dev/pandas/blob/e78ebd3f845c086af1d71c0604701ec49df97228/pandas/tests/generic/test_to_xarray.py#L32
Here's a minimal test case:
I'm not sure if this is a pandas or xarray issue, but it's one or the other!
(My guess is that most of these tests in pandas should probably live in xarray instead, given that we implement all the conversion logic.)
Originally posted by @shoyer in https://github.com/pydata/xarray/issues/9520#issuecomment-2386077534