rapidsai / cudf

cuDF - GPU DataFrame Library
https://docs.rapids.ai/api/cudf/stable/
Apache License 2.0
8.23k stars 883 forks source link

[FEA] Support Polars datetime `round` expression #16226

Open beckernick opened 2 months ago

beckernick commented 2 months ago

We should eventually support plans that include datetime round.

import polars as pl
from functools import partial
from cudf_polars.callback import execute_with_cudf
from datetime import timedelta, datetime

use_cudf = partial(execute_with_cudf, raise_on_fail=True) # for testing

start = datetime(2001, 1, 1)
stop = datetime(2001, 1, 2)
s = pl.datetime_range(
    start, stop, timedelta(minutes=165), eager=True
).alias("datetime")
df = s.to_frame().lazy()

print(df.select(pl.col("datetime").dt.round("1h")).collect())
print(df.select(pl.col("datetime").dt.round("1h")).collect(post_opt_callback=use_cudf))
shape: (9, 1)
┌─────────────────────┐
│ datetime            │
│ ---                 │
│ datetime[μs]        │
╞═════════════════════╡
│ 2001-01-01 00:00:00 │
│ 2001-01-01 03:00:00 │
│ 2001-01-01 06:00:00 │
│ 2001-01-01 08:00:00 │
│ 2001-01-01 11:00:00 │
│ 2001-01-01 14:00:00 │
│ 2001-01-01 17:00:00 │
│ 2001-01-01 19:00:00 │
│ 2001-01-01 22:00:00 │
└─────────────────────┘
---------------------------------------------------------------------------
ComputeError                              Traceback (most recent call last)
Cell In[46], line 18
     15 df = s.to_frame().lazy()
     17 print(df.select(pl.col("datetime").dt.round("1h")).collect())
---> 18 print(df.select(pl.col("datetime").dt.round("1h")).collect(post_opt_callback=use_cudf))

File [/raid/nicholasb/miniconda3/envs/all_cuda-122_arch-x86_64/lib/python3.11/site-packages/polars/lazyframe/frame.py:1942](http://10.117.23.184:8882/lab/tree/raid/nicholasb/raid/nicholasb/miniconda3/envs/all_cuda-122_arch-x86_64/lib/python3.11/site-packages/polars/lazyframe/frame.py#line=1941), in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, no_optimization, streaming, background, _eager, **_kwargs)
   1939 # Only for testing purposes atm.
   1940 callback = _kwargs.get("post_opt_callback")
-> 1942 return wrap_df(ldf.collect(callback))

ComputeError: 'cuda' conversion failed: TypeError: cannot unpack non-iterable builtins.TemporalFunction object
wence- commented 2 months ago

Easy to do, but also will want https://github.com/pola-rs/polars/pull/17518

vyasr commented 1 month ago

Just noting that the Polars PR linked above is merged.

lithomas1 commented 1 month ago

OK, this one is a bit annoying.

libcudf uses std::chrono::round which does banker's rounding (i.e. round to even number when the value to round is in the middle of a frequency)

I think we'll need them to implement HALF_UP rounding for this to work well.