pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.57k stars 1.98k forks source link

`is_in` operation not supported for list types #14830

Open shenker opened 8 months ago

shenker commented 8 months ago

Description

is_in was recently fixed for String-Categorical/Enum comparisons (https://github.com/pola-rs/polars/issues/14575) and list.contains was recently fixed for List(Categorical/Enum)-Categorical/Enum comparisons (https://github.com/pola-rs/polars/issues/14559), but List(Categorical)-List(Categorical) comparisons still need to be fixed for is_in.

pl.Series([["a", "b"],["b","b"]], dtype=pl.List(pl.Categorical)).is_in(pl.Series([["b", "a"],["c","c"]], dtype=pl.List(pl.Categorical)))
# InvalidOperationError: `is_in` operation not supported for dtype `list[cat]`
c-peters commented 8 months ago

For completeness, this is in general and not only for categorical types. Currently, list vs list is not yet supported


s=pl.Series([[1,2],[2,3]])
s2=pl.Series([[2,3],[4,3]])
s.is_in(s2)
# polars.exceptions.InvalidOperationError: `is_in` operation not supported for dtype `list[i64]`
Ge0rges commented 1 week ago

Any suggested workarounds?

RayHackett commented 5 days ago

I think Expr.list.set_intersection() should have the desired effect. https://docs.pola.rs/api/python/dev/reference/expressions/api/polars.Expr.list.set_intersection.html