Open divyegala opened 5 years ago
@divyegala, given the progression of the handle API, it's movement to raft, and our documntation in general, do you feel this issue is still relevant or can it be closed now?
This discussion came up again today, as DBSCAN Python docs indicate it's possible to use Handles/streams for concurrency. We noted several points:
fit
calls are blocking, which would require using something like multiple host threadscc @divyegala
@cjnolet @dantegd revisiting this topic as it came up today. After my own experiments with trying to achieve concurrency for running multiple cuML models on a single or multiple host threads, I am capturing here what we need to do to eventually achieve full concurrency.
For single host thread, multiple models (this basically needs the end-to-end Python API call to be asynchronous):
predict
class of functions of simpler algorithms like linear modelsFor multiple host threads, multiple models:
fit
and predict
class of functionsCommon to both paradigms:
cudaMallocAsync
)Challenges/tasks needed to be performed to achieve both the above paradigms:
default-stream per-thread
behaviorReference for cudaMemcpyAsync
behavior: https://docs.nvidia.com/cuda/cuda-runtime-api/api-sync-behavior.html#api-sync-behavior__memcpy-async
@divyegala @cjnolet @dantegd I am wondering if there are any updates regarding executing fit of multiple cuml models concurrently using different threads and streams, does anyone know of a working example of this?
@leonardottl , did you find any working example of executing fit of multiple cuML models concurrently using different threads and streams? Or maybe even a simpler example of a distributed fit on a single cuML model (e.g K-means) so that multiple concurrent fit tasks can be executed on the same GPU?
From what I observed, and from conversation with @cjnolet, every cuml algorithm creates a handle which uses the default stream (
NULL
, we checked up tocumlHandle_impl
constructor). Without knowing that greater concurrency can be achieved when running two things together just by setting different handles and streams, we lose out on a powerful feature (I know that this is covered in the Developer Guide). It will be nice to have the Handle class on docs.rapids.ai, and an example notebook demonstrating how this concurrency is acheived.