xorbitsai / xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.
https://xorbits.readthedocs.io
Apache License 2.0
1.1k stars 67 forks source link

BUG: xorbits.DataFrames drop all columns that were not used in a calculation. #672

Closed MarcelHoh closed 12 months ago

MarcelHoh commented 1 year ago

Describe the bug

When calling the following code

import pandas as pd
import numpy as np
import xorbits.pandas as xpd

df = pd.DataFrame({'a' : np.random.uniform(0,1,1000),
                   'b' : np.random.uniform(1,2,1000)})

df.to_parquet('test.pq')
del df

df = xpd.read_parquet('test.pq')
print(df.keys())

a = df['a'].to_numpy()
print(a.mean())

print(df.keys())

I get the output:

Index(['a', 'b'], dtype='object')
0.5050958861272348
Index(['a'], dtype='object')

If I instead call (df['a']*df['c']).to_numpy() then a and c are kept but b is still dropped. It looks like xorbits is dropping all columns that are not used in a calculation from the dataframe.

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Your Python version: 3.10.10
  2. The version of Xorbits you use: 0.5.1
  3. Versions of crucial packages, such as numpy, scipy and pandas: (numpy: '1.24.2', pandas: '1.4.0')
  4. Full stack of the error.: No stack is created as no crash is caused.
  5. Minimized code to reproduce the error. : See above.

Expected behavior

All columns in the dataframe remaining accessible.

Additional context

None

qinxuye commented 1 year ago

Thank you for your report, it's helpful, we will try to address the issue and fix it ASAP if it's confirmed a bug.

aresnow1 commented 1 year ago

I've reproduced it, and it is a bug caused by optimization module. We will fix it soon.

MarcelHoh commented 1 year ago

Hi, thank you for verifying so quickly! Do you have a rough idea of a timeline for a fix? I would really like to start using xorbits but this issue is preventing me from making the switch.

ChengjieLi28 commented 1 year ago

Hi, thank you for verifying so quickly! Do you have a rough idea of a timeline for a fix? I would really like to start using xorbits but this issue is preventing me from making the switch.

Hi, @MarcelHoh . This issue will be fixed in this week ASAP.