xorbitsai / xorbits

Scalable Python DS & ML, in an API compatible & lightning fast way.
https://xorbits.io
Apache License 2.0
1.06k stars 67 forks source link

BUG: fix compatibility issue of df.sort_index() and df.groupby(sort=True) #776

Closed luweizheng closed 3 weeks ago

luweizheng commented 1 month ago

What do these changes do?

sort_index

Xorbits' DataFrame sort_index is inconsistent with pandas when axis=1 and ignore_index=True. pandas does it by sorting the index as usual and replacing index (when axis=0) or columns (when axis=1) with RangeIndex. Here is the reference code.

So here we follow pandas and replace index or columns when ignore_index=True is passed.

groupby(sort=True).agg()

python/xorbits/_mars/dataframe/groupby/sort.py:

out_df = in_df.loc[pivots[p_index - 1] :].drop(
                    index=pivots[p_index - 1], errors="ignore"
                )

In the latest version, during the map phase of DataFrameGroupbySortShuffle, if in_df does not contain the index from pivots, an empty out_df is returned.

Related issue number

Check code requirements