v6d-io / v6d

vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)
https://v6d.io
Apache License 2.0
816 stars 117 forks source link

Metadata of DataFrame from Vineyard is different from the original one #1930

Closed luweizheng closed 4 days ago

luweizheng commented 1 week ago

Describe your problem

Numpy is good. The actual data of the dataframe is the same while the metadata is different.

import vineyard
client = vineyard.connect()

import pandas as pd

df = pd.DataFrame({'u': [0, 0, 1, 2, 2, 3],
                   'v': [1, 2, 3, 3, 4, 4],
                   'weight': [1.5, 3.2, 4.7, 0.3, 0.8, 2.5]})
object_id = client.put(df)

shared_dataframe = client.get(object_id)
shared_dataframe

pd.testing.assert_frame_equal(df, shared_dataframe)

Here is the output:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "***/envs/cudf/lib/python3.10/site-packages/pandas/_testing/asserters.py", line 1279, in assert_frame_equal
    assert_series_equal(
  File "***/envs/cudf/lib/python3.10/site-packages/pandas/_testing/asserters.py", line 997, in assert_series_equal
    assert_numpy_array_equal(
  File "***/envs/cudf/lib/python3.10/site-packages/pandas/_testing/asserters.py", line 652, in assert_numpy_array_equal
    assert_class_equal(left, right, obj=obj)
  File "***/envs/cudf/lib/python3.10/site-packages/pandas/_testing/asserters.py", line 383, in assert_class_equal
    raise_assert_detail(obj, msg, repr_class(left), repr_class(right))
  File "***/envs/cudf/lib/python3.10/site-packages/pandas/_testing/asserters.py", line 614, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: DataFrame.iloc[:, 0] (column name="u") are different

DataFrame.iloc[:, 0] (column name="u") classes are different
[left]:  ndarray
[right]: ndarray

If is is a bug report, to help us reproducing this bug, please provide information below:

  1. Your Operation System version (uname -a): Linux n1 4.18.0-80.7.1.el8_0.x86_64 #1 SMP Sat Aug 3 15:14:00 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
  2. The version of vineyard you use (vineyard.__version__): 0.23.2
  3. Versions of crucial packages, such as gcc, numpy, pandas, etc.: pandas: 2.2.2
  4. Full stack of the error (if there are a crash): see above
  5. Minimized code to reproduce the error: see above

dashanji commented 1 week ago

Thanks for raising this issue, we'll check it ASAP.

dashanji commented 1 week ago

Hi @luweizheng. There might be some compatibility issues between the latest versions of pandas(>=2.2.0) and vineyard. You can try to use a lower version of pandas to workaround this. I have tested the pandas 2.1.4 and it can work.

luweizheng commented 1 week ago

Thanks for your reply! Test on pandas 2.1.4 and it does not have this issue.

sighingnow commented 1 week ago

@dashanji

sighingnow commented 1 week ago

@luweizheng Thanks for raising the issue!

@dashanji Please fix the compatibility issue with the latest pandas.

dashanji commented 1 week ago

@dashanji Please fix the compatibility issue with the latest pandas.

Ok