Open markxwang opened 1 month ago
A similar behaviour:
The following returns an empty dataframe:
pl.Series(name="data", values=[], dtype=pl.Float64).to_frame().select(
pl.col("data").cum_prod()
)
# data
# --
# f64
Adding last return an non-empty dataframe
pl.Series(name="data", values=[], dtype=pl.Float64).to_frame().select(
pl.col("data").cum_prod().last()
)
# data
# --
# f64
# null
Similar confusion to what I had in https://github.com/pola-rs/polars/issues/18404
I think the current behaviour kinda makes sense If all the expressions return 1 value, then you get 1 row with these values. If at least one of them return a column, then your result returns exactly the same number of lines as number of rows in initial dataframe (0 in your cases) and constants are broadcasted.
Maybe one way is to have separate select functions for this as I mentioned in https://github.com/pola-rs/polars/issues/18404#issuecomment-2315687383
I mean it's not exactly the same, but still you have sizes 1,1, and 0
Checks
Reproducible example
Log output
No response
Issue description
Currently, sum/product an empty dataframe will lead to 0/1 respetively,
However, the result can be wiped out when a "cross-row" operation such as shift/cum_sum/pct_change/diff is introduced alongside product/sum. It returns a empty dataframe
Expected behavior
Not entirely sure what would be the expected behaviour
Installed versions