Open beckernick opened 4 months ago
Notes:
We want to be able to profile GPU-accelerated queries with the Polars LazyFrame profiler. I imagine the API should look like how we execute a query using collect (ie. .collect(engine="gpu")
) So API would look like
lf = pl.LazyFrame(
{
"a": ["a", "b", "a", "b", "b", "c"],
"b": [1, 2, 3, 4, 5, 6],
"c": [6, 5, 4, 3, 2, 1],
}
)
q = lf.group_by("a", maintain_order=True).agg(pl.all().sum()).sort(
"a"
)
df, df_times = lf.profile(engine="gpu")
Currently the Polars LazyFrame profiler works like this:
I think if we allow the profile functions in the Polars rust layer to accept a callback (in the same way we do for collect), we can get the timing information from step 3. We'd need to change these functions (copied from https://github.com/pola-rs/polars/tree/main/crates/polars-python/src/lazyframe)
// LazyFrame::profile (Can this remain unchanged?)
pub fn profile(self) -> PolarsResult<(DataFrame, DataFrame)> {
let (mut state, mut physical_plan, _) = self.prepare_collect(false)?;
state.time_nodes();
let out = physical_plan.execute(&mut state)?;
let timer_df = state.finish_timer()?;
Ok((out, timer_df))
}
// PyLazyFrame::profile
fn profile(&self, py: Python, lambda_post_opt: Option<PyObject>) -> PyResult<(PyDataFrame, PyDataFrame)> {
// follow the logic in collect
}
And we'd need to add an engine kwarg to profile function in polars
def profile(
self,
*,
...
engine: EngineType = "cpu",
) -> tuple[DataFrame, DataFrame]:
...
# Following the logic in collect
callback = None
if engine == "gpu":
cudf_polars = import_optional(
"cudf_polars",
...
)
if not isinstance(engine, GPUEngine):
engine = GPUEngine()
callback = partial(cudf_polars.execute_with_cudf, config=engine)
df, timings = self._ldf.profile(callback)
df, timings = wrap_df(df), wrap_df(timings)
...
return df, timings
Finally, we should do a docs refresh.
We should instrument the GPU physical execution engine so that it is compatible with the built-in Polars LazyFrame profiler (or similar).