modAL-python / modAL

A modular active learning framework for Python
https://modAL-python.github.io/
MIT License
2.23k stars 324 forks source link

Add support for parallel querying #123

Open zacps opened 3 years ago

zacps commented 3 years ago

When the number of unlabelled points is very large it may be beneficial to copy the classifier into a number of threads/processes and query chunks of the data separately, then recombine and rank them.

Query methods should take an n_jobs parameter which controls this behaviour.

remiadon commented 3 years ago

Just adding a simple reference if that helps anyone

dask_ml has a ParallelPostFit wrapper that does exactly this

Edit : This wrapper clones the underlying estimator when being instanciated. In the context of Active Learning that might be an issue, as the estimator is updated quite frequently