Open JohnZed opened 4 years ago
This is an api cleanup and perf improvement, not immediately critical.
If I understand this correctly, it should be handled by #1396, which allows it to be done in the Python layer so that we don't need custom C++ code for each algorithm to do this.
Right now, only one partition per worker is run at a time. We could "virtually concat" them into an array of pointers to allow multiple input partitions to a single job, e.g. sampling from all of the sub-partitions when subsample.
Look to tSVD and PCA for examples.