Open pozitiff4ikk opened 1 week ago
It's a little confusing at first - but I think this may be the correct behaviour?
In the frame version, the column in the filter has also been sorted.
df = pl.DataFrame({"a": [1, 3, 2, 3, 4], "b": [10, 11, 12, 13, 14]})
df.sort("a", descending=True).filter(pl.col.b == 12) # b = [14, 11, 13, 12, 10]
# shape: (1, 2)
# ┌─────┬─────┐
# │ a ┆ b │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 2 ┆ 12 │
# └─────┴─────┘
But in the expression version, it is still the original order.
df.select(
pl.col("a", "b").sort_by("a", descending=True)
.filter(pl.col.b == 12) # b = [10, 11, 12, 13, 14]
)
# shape: (1, 2)
# ┌─────┬─────┐
# │ a ┆ b │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 3 ┆ 13 │
# └─────┴─────┘
The column in the filter would also need to be sorted to be equivalent?
df.select(
pl.col("a", "b").sort_by("a", descending=True)
.filter(pl.col.b.sort_by("a", descending=True) == 12)
)
# shape: (1, 2)
# ┌─────┬─────┐
# │ a ┆ b │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 2 ┆ 12 │
# └─────┴─────┘
@cmdlineluser ty for your reply, seems to be working this way, but in my code i have multiple expressions like this, and somehow it was working that way, but when i update polars from 1.5 to 1.6/1.7 i`ve noticed this behavior. Cant reproduce it with small example. Made it work with this fix for now, maybe this behavior will be changed in future.
On mobile so can't test myself but try making lazy and turn off optimizations in the collect.
@deanm0000 still getting the same result
Checks
Reproducible example
Log output
Issue description
Getting different result after same operations on dataframe and column
Expected behavior
res1 and res2 should be equal
Installed versions