[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
import pandas as pd
import polars as pl
num_rows = 1000000
utc_time = pd.date_range(start='2023-01-01', periods=num_rows, freq='ms')
# Create the DataFrame
df = pd.DataFrame({
'UTC_Time': utc_time
})
df['UTC_Time'] = df['UTC_Time'].sort_values()
df = pl.DataFrame(df)
df = df.with_columns(pl.col("UTC_Time").truediv(1).alias("Unix1"))
df = df.with_columns(pl.col("UTC_Time").dt.epoch(time_unit="ns").alias("Unix2"))
# Display the first few rows
print(df.head())
%timeit df.with_columns(pl.col("UTC_Time").truediv(1))
%timeit df.with_columns(pl.col("UTC_Time").dt.epoch(time_unit="ns")) # Slightly faster
%timeit df.with_columns(pl.col("UTC_Time").truediv(1000))
%timeit df.with_columns(pl.col("UTC_Time").dt.epoch(time_unit="us")) # Much slower
%timeit df.with_columns(pl.col("UTC_Time").truediv(1000000))
%timeit df.with_columns(pl.col("UTC_Time").dt.epoch(time_unit="ms")) # Much slower
%timeit df.with_columns(pl.col("UTC_Time").truediv(1000000000))
%timeit df.with_columns(pl.col("UTC_Time").dt.epoch(time_unit="s")) # Even slower
Log output
16.8 µs ± 576 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
13.2 µs ± 321 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
1.65 ms ± 26.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
2.74 ms ± 36.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.65 ms ± 21.5 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
2.76 ms ± 38.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
1.67 ms ± 35 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
4.22 ms ± 204 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Issue description
So, Im unsure if this is a bug, but I find it weird that dt.epoch(), a function specifically designed to convert datetime to unix format, is slower than truediv. Also it gets progressively worse when we convert with second-precision.
Checks
Reproducible example
Log output
Issue description
So, Im unsure if this is a bug, but I find it weird that dt.epoch(), a function specifically designed to convert datetime to unix format, is slower than truediv. Also it gets progressively worse when we convert with second-precision.
Expected behavior
That they are at least equally as fast
Installed versions