import numpy as np
from skmatter.feature_selection import FPS
np.random.seed(0)
n_samples = 10
n_features = 15
X = np.random.rand(n_samples , n_features )
X[:, 3] = np.random.rand(10) * 1e-13
X[:, 4] = np.random.rand(10) * 1e-13
selector_problem = FPS(n_to_select=len(X.T)).fit(X)
print(selector_problem.selected_idx_)
print(selector_problem.get_select_distance())
print()
# this selector does not have the problem because we stop before the score threshold
selector = FPS(n_to_select=len(X.T), score_threshold=1e-9).fit(X)
print(selector.selected_idx_)
print(selector.get_select_distance())
One could add selected_idx_ to the GreedySelector base class and change
the argmax in the function above that it only considers the not selected indices.
Detected by @PicoCentauri
Problem
Out:
You can see in the first selector that 8 is reselected and sets the wrong score. This is because we do not filter for not selected points in the GreedySelector base class when choosing the next point. https://github.com/scikit-learn-contrib/scikit-matter/blob/d56ccbd4648ad90299b27cb5c23ecd3b39e4d12a/src/skmatter/_selection.py#L371 So when the scores are all (numerical) zero, then points that have been already selected can be reselected.
Solution
One could add
selected_idx_
to the GreedySelector base class and change the argmax in the function above that it only considers the not selected indices.