pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.62k stars 17.57k forks source link

ENH: In pd.cut(), allow bins='auto' (leveraging `np.histogram_bin_edges`) #59165

Open Hari-Shankar-Karthik opened 3 days ago

Hari-Shankar-Karthik commented 3 days ago

Feature Type

Problem Description

While converting a quantitative variable into a qualitative one, pd.cut() comes in clutch. However, it requires the user to specify bins as either an integer or a list of bin edges. I wish it was allowed to specify bins='auto' similar to how np.histogram allows it. It internally leverages np.histogram_bin_edges to compute these. Thank you.

Expectation

Instead of coding pd.cut(df['x1'], bins=np.histogram_bin_edges(df['x1'], bins='auto')) Allow for coding pd.cut(df['x1'], bins='auto')

Additional Context

Calculation of bin edges is already done via np.histogram_bin_edges. Reference: https://numpy.org/doc/stable/reference/generated/numpy.histogram_bin_edges.html#numpy-histogram-bin-edges

Aloqeely commented 1 day ago

Thanks for the suggestion! It appears there was an effort to allow string bins in pd.cut in #23567 but that PR got stale. PRs are welcomed to add string bins support, dispatching the string to np.histogram_bin_edges.

chaarvii commented 14 hours ago

Hey, would like to work on this.

chaarvii commented 14 hours ago

Take