Open vinhloc30796 opened 1 year ago
Just wanted to mention that this fixing this would be very helpful for a use case that I have.
@sjt-motif You could convert it to a struct as a temporary workaround.
df.select(pl.col("a").sort_by(pl.col("a").list.to_struct("max_width")))
# shape: (7, 1)
# ┌───────────┐
# │ a │
# │ --- │
# │ list[i64] │
# ╞═══════════╡
# │ [] │
# │ [0] │
# │ [1] │
# │ [1, 0] │
# │ [1, 1] │
# │ [1, 2] │
# │ [2, 3, 5] │
# └───────────┘
@cmdlineluser Thanks! It works great except the descending=True
version exhibits some weird behavior:
>>> df.select(pl.col("a").sort_by(pl.col("a").list.to_struct("max_width"), descending=True))
shape: (7, 1)
┌───────────┐
│ a │
│ --- │
│ list[i64] │
╞═══════════╡
│ [] │
│ [2, 3, 5] │
│ [1] │
│ [1, 2] │
│ [1, 1] │
│ [1, 0] │
│ [0] │
└───────────┘
That's quite easy for me to workaround though. Thanks!
Hm yeah, not sure why [1]
comes first there, there's also no nulls_last=
for .sort_by
I guess this is closer to what you want:
(df.with_columns(sort_by = pl.col("a").list.to_struct("max_width"))
.sort("sort_by", descending=True, nulls_last=True))
# shape: (7, 2)
# ┌───────────┬──────────────────┐
# │ a ┆ sort_by │
# │ --- ┆ --- │
# │ list[i64] ┆ struct[3] │
# ╞═══════════╪══════════════════╡
# │ [2, 3, 5] ┆ {2,3,5} │
# │ [1, 2] ┆ {1,2,null} │
# │ [1, 1] ┆ {1,1,null} │
# │ [1, 0] ┆ {1,0,null} │
# │ [1] ┆ {1,null,null} │
# │ [0] ┆ {0,null,null} │
# │ [] ┆ {null,null,null} │
# └───────────┴──────────────────┘
Thanks @cmdlineluser, I basically did that, but dumber (manually find max length by agg.max()
, then lambda s: list(s) + [None} * max_length
, then finally explode the columns & sort by all).
Your way is wayyyy shorter lol.
If this one gets fixed, then assert_frame_equal will also work for pl.List
cols:
import polars as pl
from polars.testing import assert_frame_equal
df1 = pl.DataFrame({"A":[1,2,3,4,5], "B":["H","E","L","L","O"], "C":[[0,0],[0,0],[0,0],[0,0],[0,0]]})
df2 = df1.sort("B")
assert_frame_equal(df1, df2, check_row_order=False)
# ComputeError: cannot sort column of dtype `list[i64]`
# InvalidAssert: cannot set 'check_row_order=False' on frame with unsortable columns
@trinebrockhoff Even though it's related, perhaps that deserves its own issue?
That particular use-case seems like something that warrants a higher priority.
I also didn't realize at the time of my previous comment that you can pass expressions directly to .sort()
- so the .with_columns()
wasn't actually needed.
df.sort(
pl.col("a").list.to_struct("max_width"),
descending=True,
nulls_last=True
)
# shape: (7, 1)
# ┌───────────┐
# │ a │
# │ --- │
# │ list[i64] │
# ╞═══════════╡
# │ [2, 3, 5] │
# │ [1, 2] │
# │ [1, 1] │
# │ [1, 0] │
# │ [1] │
# │ [0] │
# │ [] │
# └───────────┘
It also doesn't seem to be supported for lists of structs:
schema = {"value": pl.List(pl.Struct({"a": pl.Int32}))}
data = {"value": [[{"a": 1}, {"a": 2}]]}
df = pl.DataFrame(data, schema=schema)
df.sort("value")
Resulting in
InvalidOperationError: `sort_with` operation not supported for dtype `list[struct[1]]`
It would be really nice if sort
would support pl.List
datatypes, especially since the assert_frame_equal
function with check_row_order=False
will fail now.
@maxzw Yeah, it seems that example also causes .group_by
to panic.
df.group_by("value").all()
thread '' panicked at crates/polars-core/src/frame/group_by/into_groups.rs:296:52:
PanicException: called `Result::unwrap()` on an `Err` value: ComputeError(ErrString("cannot sort column of dtype `list[struct[1]]`"))
Similar issue?
polars.exceptions.InvalidOperationError: `arg_sort_multiple` operation not supported for dtype `list[str]`
This seems similar:
polars.exceptions.InvalidOperationError: cannot sort column of dtype `list[struct[7]]`
Input dataframe is like this, from .glimpse()
:
$ lines <list[struct[7]]> [{'description': 'Parts and Supplies', 'unitAmount': 88.33, 'quantity': 1, 'taxRateRef': {'id': 'NON'}, 'inventoryRef': {'id': '75', 'name': 'Sales-Products-Hardware'}, 'id': '1', 'amount': 88.33}], (...)
Problem description
Using df.sort() with a list[i64] column raises an error pointing to:
https://github.com/pola-rs/polars/blob/master/polars/polars-core/src/series/series_trait.rs#L390-L399
Also mentioned as a part of #7777
Thanks!
Polars version checks
[x] I have checked that this issue has not already been reported.
[x] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
Expected behavior