Open zacps opened 3 years ago
Just adding a simple reference if that helps anyone
dask_ml has a ParallelPostFit wrapper that does exactly this
Edit : This wrapper clones the underlying estimator when being instanciated. In the context of Active Learning that might be an issue, as the estimator is updated quite frequently
When the number of unlabelled points is very large it may be beneficial to copy the classifier into a number of threads/processes and query chunks of the data separately, then recombine and rank them.
Query methods should take an
n_jobs
parameter which controls this behaviour.