Closed AlexeyGy closed 4 years ago
you can probably use the key= option here (happy to take a PR to show this in the docs)
-1 on adding any api
Is it oké to give an example in the docs using a third party package like natsort? @jreback
hai, I solved the problem using key as @jreback said..
df = pd.DataFrame(data).sort_values(by=['Patient_ID'], key=lambda col:col.astype(str).str[3:].astype(int))
as it would take Patient_ID as string[3:] (so it will ignore 'ID-') and convert it back to int for sorting..
Here's a more general solution, which should work in all cases:
from natsort import index_natsorted
df.sort_values(
by="col_name",
key=lambda x: np.argsort(index_natsorted(df["col_name"]))
)
Or without the lambda:
def natural_sort(column):
idx = index_natsorted(column)
return np.argsort(idx)
df.sort_values(by="col_name", key=lambda x: natural_sort(df["col_name"]))
Which needs an installation first: pip install natsort
@erfannariman so happy to take a PR to add that as an example; it would be ok i think to add to environment.yml
with an appropriate comment (used in doc-strings) as this environment builds the docs.
@jreback PR is ready for review, weird stata test failing, will check later.
Is your feature request related to a problem?
The natural sort order is a common use case when working with real-world data. For example, consider the following DataFrame of clinical data where the body temperature of patients was measured:
will yield:
whereas we would want
Describe the solution you'd like
sort_order
that is by default alphabetical and could be switched to natural.natsort
brings.: modify all values and pass them to np.argsort() s.t. then transform them back.API breaking implications
Since we are only adding a parameter this would not break any existing API.
Describe alternatives you've considered
Currently, one could use the
natsort
package. However, this seems cumbersome for such a common operation and makes it necessary to reindex the DataFrame. Stackoverflow example.