Closed Menelau closed 5 years ago
point 1: I had the same doubt when working on this pull request. Since this class was initially developed to be used only inside the DS classes, I think we can leave like that for now and see if we will need to make this model a sklearn estimator in the future. That would involve other steps such as inheriting from BaseEstimator and ClassifierMixin, etc...
point 2: Yes we need to make that clear in the documentation. However, if they have two versions (one brute force and another approximate) maybe we should allow both versions in the library.
Ok for point 1. For the second point, it seems to me that there are several variants to choose. For instance, there is this post that benchmarks several alternatives for indexing a dataset with 1billion samples: https://github.com/facebookresearch/faiss/wiki/Indexing-1G-vectors
Either way, I think we should merge this PR and create a new issue to investigate the alternatives that use approximate search.
Agreed. I just create a new issue to track this second point: #140
Batch processing in FaissKNNClassifier predict/predict_proba methods (Fixes issue #103 )
Checking if the estimator was already fitted before its use.