pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.08k stars 1.94k forks source link

add `list.filter()` #9189

Open mcrumiller opened 1 year ago

mcrumiller commented 1 year ago

Problem description

filter would be a useful addition to the list namespace, which could just be syntactic sugar for list.eval(pl.element().filter(pl.element()...))

s = pl.Series([
    [1, 2, 3, 4, 5],
    [1, 3, 7, 8],
    [6, 1, 4, 5],    
])

# current implementation
s.list.eval(pl.element().filter(pl.element() < 5))

# proposed syntax
s.list.filter(pl.element() < 5)
shape: (3,)
Series: '' [list[i64]]
[
        [1, 2, 3, 4]
        [1, 3]
        [1, 4]
]
parkma99 commented 1 year ago

9425

orlp commented 9 months ago

This looks good to me but I do have a concern. Currently pl.element is defined as

Alias for an element being evaluated in an eval expression.

We will need to change this, and create some sort of 'list expression' concept, where list.eval and list.filter accept 'list expressions'.

DeflateAwning commented 5 months ago

Would love this syntactic sugar. I derived the pl.col(list_col_name).list.eval(pl.element().filter(~ pl.element().is_in([0, 1]))) from first principles, but would way rather have just used the applicable .list.filter(...), if it existed.

DeflateAwning commented 2 months ago

I've once again been brought here. The double use of pl.element() is a little cringe.

mkleinbort-ic commented 2 months ago

Yes, would be good

ghaffarialireza commented 1 month ago

still waiting for it