scikit-learn-contrib / DESlib

A Python library for dynamic classifier and ensemble selection
BSD 3-Clause "New" or "Revised" License
480 stars 106 forks source link

Faiss batch processing #139

Closed Menelau closed 5 years ago

Menelau commented 5 years ago
Menelau commented 5 years ago

point 1: I had the same doubt when working on this pull request. Since this class was initially developed to be used only inside the DS classes, I think we can leave like that for now and see if we will need to make this model a sklearn estimator in the future. That would involve other steps such as inheriting from BaseEstimator and ClassifierMixin, etc...

point 2: Yes we need to make that clear in the documentation. However, if they have two versions (one brute force and another approximate) maybe we should allow both versions in the library.

luizgh commented 5 years ago

Ok for point 1. For the second point, it seems to me that there are several variants to choose. For instance, there is this post that benchmarks several alternatives for indexing a dataset with 1billion samples: https://github.com/facebookresearch/faiss/wiki/Indexing-1G-vectors

Either way, I think we should merge this PR and create a new issue to investigate the alternatives that use approximate search.

Menelau commented 5 years ago

Agreed. I just create a new issue to track this second point: #140