xorbitsai / xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.
https://xorbits.readthedocs.io
Apache License 2.0
1.1k stars 67 forks source link

BUG: Vineyard storage backend changes the arrow type from string to large_string. #661

Open codingl2k1 opened 1 year ago

codingl2k1 commented 1 year ago

Describe the bug

A clear and concise description of what the bug is.

=================================== FAILURES ===================================
_________________________ test_put_get_arrow[vineyard] _________________________

storage_context = <xorbits._mars.storage.vineyard.VineyardStorage object at 0x7fdd9a401940>

    @pytest.mark.asyncio
    @require_lib
    async def test_put_get_arrow(storage_context):
        storage = storage_context

        data = [
            pa.Table.from_pydict({"a": [1, 2, 3], "b": list("abc")}),
            pa.RecordBatch.from_pydict({"a": [1, 2, 3], "b": list("abc")}),
        ]
        for d in data:
            put_info = await storage.put(d)
            get_data = await storage.get(put_info.object_id)
>           assert d == get_data
E           assert pyarrow.Table...["a","b","c"]] == pyarrow.Table...["a","b","c"]]
E             Full diff:
E               pyarrow.Table
E               a: int64
E             - b: large_string
E             ?    ------
E             + b: string
E               ----
E               a: [[1,2,3]]
E               b: [["a","b","c"],
E               ]

xorbits/_mars/storage/tests/test_libs.py:400: AssertionError

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version
  2. The version of Xorbits you use
  3. Versions of crucial packages, such as numpy, scipy and pandas
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.