pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.63k stars 1.99k forks source link

df.sort() could work on the only field on a one-field dataframe, without adding fieldname to sort() #20045

Open lmocsi opened 3 days ago

lmocsi commented 3 days ago

Description

If the dataframe contains only one field, then it would be a nice simplification to use df.sort() without adding the field name. In this case polars could take the only field there and sort by that.

So instead of: pl.DataFrame({'myfield':[3,6,5,4,1]}).sort('myfield') this could work: pl.DataFrame({'myfield':[3,6,5,4,1]}).sort()

alexander-beedie commented 3 days ago

There is an implied ambiguity here - specifically, it's not clear that omitting the sort field would work because it's a one-column frame, or because omitting the sort field means "sort by all the fields". Someone thinking the latter and seeing that this works on a one-column frame will then be surprised that it doesn't go on to work on a multi-column frame (which we likely wouldn't want as a default behaviour due to the high cost of sorting very wide frames). Not sure if this is really a significant problem or not, but needs a little consideration 🤔

lmocsi commented 2 days ago

I'd say, that if the df has more than one field, polars should raise the error it raises now. To make it clear for everyone. And of course, write it in the documentation. But this is not a significant issue...