Open dcolinmorgan opened 1 year ago
It's not absolutely obvious that a naive implementation would lead to speed-ups.
Anyhow, an important question for this alley to be possible: how would we do CI (continuous integration)?
Maybe a useful, more declarative question is: What might be a good path to enabling plugging in custom handlers, and limiting selection to them?
In our case, sometimes we want CPU-only, sometimes GPU-only (ex: an end-to-end GPU pipeline), and sometimes, ambivalent (e.g., care more about quality). There are other variants of this, like local vs remote, dask_cpu vs dask_cudf, competing impls of same alg, .... .
Our use is pretty limited:
E.g.,
engine flag to enable cuml-based implementation of class functions
Benefits to the change: gpu-based speedup
Naive pseudocode for the new behavior (realistically much tougher to implement):