Support parallel execution over multiple partitions on same worker for MNMG RF

rapidsai / cuml

cuML - RAPIDS Machine Learning Library

https://docs.rapids.ai/api/cuml/stable/

Apache License 2.0

4.18k stars 527 forks source link

Support parallel execution over multiple partitions on same worker for MNMG RF #1510

Open JohnZed opened 4 years ago

JohnZed commented 4 years ago

Right now, only one partition per worker is run at a time. We could "virtually concat" them into an array of pointers to allow multiple input partitions to a single job, e.g. sampling from all of the sub-partitions when subsample.

Look to tSVD and PCA for examples.

JohnZed commented 4 years ago

This is an api cleanup and perf improvement, not immediately critical.

cjnolet commented 4 years ago

If I understand this correctly, it should be handled by #1396, which allows it to be done in the Python layer so that we don't need custom C++ code for each algorithm to do this.