Open beckernick opened 1 week ago
It looks like our string to datetime utilities throws an error. This is a fairly common step while cleaning datasets, so it'd be nice to support it:
import polars as pl from functools import partial from cudf_polars.callback import execute_with_cudf import numpy as np use_cudf = partial(execute_with_cudf, raise_on_fail=True) ldf = pl.DataFrame({ "date": ['2015-09-11', '2017-02-08', '2015-08-01', '2019-03-16', '2015-05-15'], "val": [1, 2, 3, 4, 5] }).lazy() print(ldf.with_columns(pl.col("date").str.to_datetime()).collect()) print(ldf.with_columns(pl.col("date").str.to_datetime()).collect(post_opt_callback=use_cudf)) shape: (5, 2) ┌─────────────────────┬─────┐ │ date ┆ val │ │ --- ┆ --- │ │ datetime[μs] ┆ i64 │ ╞═════════════════════╪═════╡ │ 2015-09-11 00:00:00 ┆ 1 │ │ 2017-02-08 00:00:00 ┆ 2 │ │ 2015-08-01 00:00:00 ┆ 3 │ │ 2019-03-16 00:00:00 ┆ 4 │ │ 2015-05-15 00:00:00 ┆ 5 │ └─────────────────────┴─────┘ --------------------------------------------------------------------------- ComputeError Traceback (most recent call last) Cell In[141], line 14 12 ldf.sink_parquet("test.parquet") 13 print(ldf.with_columns(pl.col("date").str.to_datetime()).collect()) ---> 14 print(ldf.with_columns(pl.col("date").str.to_datetime()).collect(post_opt_callback=use_cudf)) File [/raid/nicholasb/miniconda3/envs/all_cuda-122_arch-x86_64/lib/python3.11/site-packages/polars/lazyframe/frame.py:1942](http://10.117.23.184:8882/lab/tree/raid/raid/nicholasb/miniconda3/envs/all_cuda-122_arch-x86_64/lib/python3.11/site-packages/polars/lazyframe/frame.py#line=1941), in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, slice_pushdown, comm_subplan_elim, comm_subexpr_elim, cluster_with_columns, no_optimization, streaming, background, _eager, **_kwargs) 1939 # Only for testing purposes atm. 1940 callback = _kwargs.get("post_opt_callback") -> 1942 return wrap_df(ldf.collect(callback)) ComputeError: 'cuda' conversion failed: NotImplementedError: String function StringFunction.Strptime
Ah, the MRE does fail. I had a typo. Editing the issue to make it clear.
It looks like our string to datetime utilities throws an error. This is a fairly common step while cleaning datasets, so it'd be nice to support it: