pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.7k stars 17.92k forks source link

Add an alias 'stable' for 'mergesort' in sort_values #20417

Closed lakshayg closed 2 years ago

lakshayg commented 6 years ago

Code Sample, a copy-pastable example if possible

df = pd.DataFrame({
    'x': [4, 3, 2, 1],
    'y': ['a', 'a', 'a','a']
})

df.sort_values(by='x', kind='mergesort')   # this is the only way to perform a stable-sort
df.sort_values(by='x', kind='stable')      # proposed alias for 'mergesort'

Problem description

Currently, pandas supports only one way of performing a stable sort i.e. mergesort. Therefore, it makes sense to add an alias for mergesort which will help in making the code more readable.

When I see 'mergesort' written in the code, it is not explicit that this was done to make the sort stable. Therefore having a 'stable' option will fit well with the idea "explicit is better than implicit".

I would like to give this a try if this issue makes sense and the devs decide to accept this proposal.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.12.final.0 python-bits: 64 OS: Linux OS-release: 4.13.0-37-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_IN LOCALE: None.None pandas: 0.20.3 pytest: None pip: 9.0.2 setuptools: 38.2.3 Cython: None numpy: 1.13.3 scipy: 0.19.1 xarray: None IPython: 5.5.0 sphinx: None patsy: None dateutil: 2.6.1 pytz: 2017.2 blosc: None bottleneck: 1.2.1 tables: None numexpr: None feather: None matplotlib: 2.0.2 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml: None bs4: None html5lib: 1.0b10 sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.10 s3fs: None pandas_gbq: None pandas_datareader: None
jreback commented 6 years ago

this would be a break with numpy and would introduce yet another convention which, while spelled nicely is incompatible.

so -0 on this.

lakshayg commented 6 years ago

Makes sense. I have posted an issue on the numpy github repo https://github.com/numpy/numpy/issues/10784. Adding it to pandas should be easy if we can get it into numpy.

lakshayg commented 6 years ago

@jreback update: numpy (1.15.0) now supports kind='stable' in the sort function.

jreback commented 6 years ago

does stable map to merge sort?

lakshayg commented 6 years ago

Yes, it maps to mergesort.

jreback commented 6 years ago

ok would take a PR to do this then (just be an alias - so we have backward compat)

lakshayg commented 6 years ago

Since numpy supports kind='stable', do I need to just update the documentation or should I add something like

if kind == 'stable':
    kind = 'mergesort'

at the places where sorting methods are called? cc @jreback

mzeitlin11 commented 3 years ago

The docs now mention this as supported, but I don't see any test coverage for it. Probably good to add a test that 'stable' is supported (and actually does a stable sort) for sort_values and sort_index. Also would good to type the kind argument using a Literal with all sorting kind options for consistency everywhere we use them

mroeschke commented 2 years ago

Looks like we have a sort_kind fixture that has mergesort and stable as well as SortKind for typing, so I think we can close this.