Closed 1112114641 closed 1 month ago
This is not really actionable. Can you create a minimal example that shows your case on a single operation?
smallest toy example I got to work:
from datetime import datetime
import polars as pl
a = pl.datetime_range(
datetime(2020, 1, 1, 1, 1),
datetime(2024, 7, 5, 3, 1),
interval="1s",
eager=True,
)
banana = (
pl.DataFrame({"dates": a, "idx": range(len(a)), "vals": range(len(a))})
.with_columns(
mask=pl.col("dates").dt.weekday().gt(5).or_(pl.col("dates").dt.hour().gt(20)), vals=pl.col("vals").cast(pl.Float64)
)
.with_columns(
pl.when(pl.col("mask")).then(pl.lit(None)).otherwise(pl.col("idx")).alias("idx"),
pl.when(pl.col("mask")).then(pl.col("vals")).otherwise(pl.lit(None)).alias("vals"),
)
)
banana.rolling(index_column="dates", period="2h").agg(pl.exclude("dates").last())
# 4.54 s ± 46.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - 0.20.6
# 5.28 s ± 73.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) - 1.0.0
In comparison to my original dataset + pipeline (~2m rows /150cols + rolling operation across 3 timeframes (2h, 2d, 2w) with ~2.5mins difference), the difference here is very, very small, but reproducible.
@stinodego I am not convinced this is a regression. There might be we do something more correct now, or it might be due to a rustc update. I want to pin down to a single operation/ commit to confirm.
If someone wants to get a bisect on this.
Interesting, I just finished a git bisect, and it seems the first / second changes wrt 0.20.6 are different ones: the second example shows the slow behaviour after 0.20.22rc1
f005b98c579c4a9d386518a6db25fc26d0b204ac is the first bad commit
commit f005b98c579c4a9d386518a6db25fc26d0b204ac
Author: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Date: Sat Apr 20 20:41:08 2024 +0200
build(rust): bump rustls from 0.21.10 to 0.21.11 (#15792)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Cargo.lock | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
which is the bump from 0.48.5->0.52.4 for libloading dependency windows-targets.
the first one after
42ba1b02730da9e83c413c2ec0d86f703b4e98cc is the first bad commit
commit 42ba1b02730da9e83c413c2ec0d86f703b4e98cc
Author: Marc Garcia <garcia.marc@gmail.com>
Date: Mon Jun 24 14:54:34 2024 +0400
test(rust): Add a test for AnonymousScan options (projection and slice pushdown) (#17149)
crates/polars-lazy/src/tests/io.rs | 39 ++++++++++++++++++++++++++++++++++++++
1 file changed, 39 insertions(+)
which points to fn scan_anonymous_fn_with_options()
and likely is a dud - I would still love to know what causes the explosion in wall-time I am seeing for that first example ¯_(ツ)_/¯.
Does this help with further analysis?
Checks
Reproducible example
Log output
No response
Issue description
Running the above code a couple of times, I get the following results:
the difference between 0.20.x/1.0.0 scales with the compute intensiveness of the aggregations - possibly this will help narrowing down the error source.
Expected behavior
comparable run times for 0.20.x / 1.0.0
Installed versions