modAL-python / modAL

A modular active learning framework for Python
https://modAL-python.github.io/
MIT License
2.22k stars 324 forks source link

Ranked batch mode sampling - pre-compute pairwise distance to reduce running time #38

Open nishkaks opened 5 years ago

nishkaks commented 5 years ago

The following code is called for computing the pairwise distances for every sample within the batch. This slows down the program significantly for larger batch sizes. https://github.com/modAL-python/modAL/blob/4029dfd4e5f68509a409d509ed706f544472bf25/modAL/batch.py#L93-L96

We can compute the pairwise distances once per batch within ranked_batch(outside the for loop) and pass only the minimum distance array to select_instance and assign it directly to https://github.com/modAL-python/modAL/blob/4029dfd4e5f68509a409d509ed706f544472bf25/modAL/batch.py#L96

There is a significant reduction in running time with this change.

@cosmic-cortex - can I contribute this code change to this repo?

cosmic-cortex commented 5 years ago

Yes, your contribution is very much welcome! This document contains a few brief contribution guidelines. Let me know if you have any questions, I am happy to help!

nishkaks commented 5 years ago

Thanks. Will submit a PR over the weekend.