pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.14k stars 1.94k forks source link

Using `inspect` during a `cumulative_eval` operation #9526

Open avimallu opened 1 year ago

avimallu commented 1 year ago

Research

Link to question on Stack Overflow

No response

Question about Polars

This is an implementation detail. While trying to answer https://github.com/pola-rs/polars/issues/9517, I came across an interesting behavior of inspect on cumulative_eval:

df = pl.DataFrame({"values": [1, 2, 3, 4, 5]})
df.with_columns(pl.col("values").cumulative_eval(pl.element().last().inspect()))

Gives me the output:

shape: (0,)
Series: '' [i64]
[
]
shape: (0,)
Series: '' [i64]
[
]
shape: (0,)
Series: '' [i64]
[
]
shape: (1,)
Series: '' [i64]
[
    1
]
shape: (1,)
Series: '' [i64]
[
    2
]
shape: (1,)
Series: '' [i64]
[
    3
]
shape: (1,)
Series: '' [i64]
[
    4
]
shape: (1,)
Series: '' [i64]
[
    5
]

Why are there three Series evaluations that return 0 shape? I understand that one of them simply might be because Polars is trying to establish the return type, but why the other two?

Might it make sense to not show this as an output to inspect if it's an implementation detail, since it really confuses the user?

avimallu commented 1 year ago

@stinodego, I noticed that you tagged this as a bug - I'm not sure it is. It does not impact how the values are stored into a new column created - they are stored correctly. It's just that inspect() returns additional values that aren't stored into the new column.

stinodego commented 1 year ago

My thinking was that this is a bug in inspect we should look at, but I admit I didn't give it a lot of thought.