pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
27.94k stars 1.71k forks source link

TRACKER: truncate / round perf improvements #16617

Open MarcoGorelli opened 1 month ago

MarcoGorelli commented 1 month ago

I'm starting a tracker based on https://github.com/pola-rs/polars/issues/16531

The things that need doing are:

The slowpath for truncate can probably be optimised by doing the following:

        // TODO: optimize the code below, so it does the following:
        //       - convert to naive
        //       - truncate all naively
        //       - localize, preserving the fold of the original datetime.
        //       The last step is the non-trivial one. But it should be worth it,
        //       and faster than the current approach of truncating everything
        //       as tz-aware.
Chuck321123 commented 3 weeks ago

@MarcoGorelli I'll add this alternative method which is faster than the current solution if you choose to continue on this issue in the future:

import pandas as pd
import polars as pl

num_rows = 1000000
utc_time = pd.date_range(start='2023-01-01', periods=num_rows, freq='s')

df = pd.DataFrame({
    'UTC_Time': utc_time
})

df['UTC_Time'] = df['UTC_Time'].sort_values()

print(df.head())

df = pl.DataFrame(df)

df = df.with_columns(pl.col("UTC_Time").dt.truncate("2m").alias("Method1"))

df = df.with_columns(pl.from_epoch((pl.col("UTC_Time")
                                    .dt.epoch(time_unit="ns")
                                    // (2 * 60 * 1_000_000_000))
                                   * (2 * 60 * 1_000_000_000),
                                   time_unit="ns").alias("Method2"))

%timeit df.with_columns(pl.col("UTC_Time").dt.truncate("2m"))

%timeit df.with_columns(pl.from_epoch((pl.col("UTC_Time").dt.epoch(time_unit="ns") // (2 * 60 * 1_000_000_000)) * (2 * 60 * 1_000_000_000), time_unit="ns"))

Console print:

4.62 ms ± 989 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.63 ms ± 178 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This was done on the latest version 1.0.0 alpha version 1.

MarcoGorelli commented 3 weeks ago

thanks - i'm seeing a much smaller difference though

3.16 ms ± 77.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2.61 ms ± 186 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)