xorbitsai / xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.
https://xorbits.readthedocs.io
Apache License 2.0
1.13k stars 69 forks source link

BUG: AttributeError: 'NoneType' object has no attribute 'kind' while using drop_duplicates in Xorbits with chunk_size #549

Closed Hank0626 closed 1 year ago

Hank0626 commented 1 year ago

Describe the bug

In the Xorbits library, when I create a DataFrame with a specified chunk_size and apply map and drop_duplicates functions on it, I encounter an AttributeError: 'NoneType' object has no attribute 'kind'. However, when I do the same operations on a DataFrame without a chunk_size specified or execute the map operation before drop_duplicates, the code runs without any error.

To Reproduce

Environment:

  1. Python version: 3.10
  2. Xorbits version: 0.4.0
  3. numpy version: 1.23.5
  4. pandas version: 1.5.3

Code to reproduce:

import xorbits.pandas as pd

df = pd.DataFrame({'col1': [1]*5 + [2]*5, 'col2': [2] * 10, 'col4': [3]*10}, chunk_size=4)
df['col3'] = df['col1'].map(lambda x:x+1)
res = df.drop_duplicates(subset='col3').drop(columns='col3', axis=1)
print(res)

Error stack trace: (Include your full error stack trace here)

Expected behavior

I expect that the code should run without throwing any AttributeError. The DataFrame should handle chunking seamlessly without modifying the behavior of the map or drop_duplicates functions.

Additional context

This issue does not occur when I do not specify chunk_size in DataFrame or execute the map operation before drop_duplicates. This suggests that there may be a problem with how Xorbits handles chunking when applying these functions.