xorbitsai / xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.
https://xorbits.readthedocs.io
Apache License 2.0
1.1k stars 67 forks source link

BUG: ipython data missing columns due to column pruning #681

Closed ChengjieLi28 closed 12 months ago

ChengjieLi28 commented 12 months ago
  1. Remove chunk graph column pruning rule since we have already done the column pruning on tileable graph
  2. In the ipython env, some tileables may miss columns data due to column pruning rule. We need re-execute those tileables when used again instead of just fetching them.
  3. Fix a bug caused by chunk graph column pruning rule like
    import xorbits.pandas as pd
    df = pd.read_parquet('xxx')
    print(df['a'].mean())

Related issue number

Fixes #672

Check code requirements

codecov[bot] commented 12 months ago

Codecov Report

Merging #681 (0ae2ecb) into main (385750e) will increase coverage by 0.01%. The diff coverage is 81.81%.

@@            Coverage Diff             @@
##             main     #681      +/-   ##
==========================================
+ Coverage   93.49%   93.51%   +0.01%     
==========================================
  Files        1027     1025       -2     
  Lines       79426    79331      -95     
  Branches    16458    16440      -18     
==========================================
- Hits        74263    74184      -79     
+ Misses       3482     3455      -27     
- Partials     1681     1692      +11     
Flag Coverage Δ
unittests 93.40% <81.81%> (+0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
...rbits/_mars/optimization/logical/chunk/__init__.py 100.00% <ø> (ø)
python/xorbits/_mars/deploy/oscar/session.py 95.08% <81.25%> (-0.55%) :arrow_down:
python/xorbits/_mars/dataframe/indexing/align.py 100.00% <100.00%> (ø)

... and 13 files with indirect coverage changes