scikit-learn-contrib / scikit-matter

A collection of scikit-learn compatible utilities that implement methods born out of the materials science and chemistry communities
https://scikit-matter.readthedocs.io/en/v0.2.0/
BSD 3-Clause "New" or "Revised" License
76 stars 20 forks source link

From docs it is not super clear that sample selection works analogously to feature selection #164

Open agoscinski opened 1 year ago

agoscinski commented 1 year ago

We have even in the examples a section Feature and Sample Selection, but no example notebook. https://scikit-matter.readthedocs.io/en/latest/tutorials.html

victorprincipe commented 1 year ago

Not too sure what exactly you mean by this. In the API-reference for Feature and Sample Selection it states that:

"scikit-matter contains multiple data sub-selection modules, primarily corresponding to methods derived from CUR matrix decomposition and Farthest Point Sampling. In their classical form, CUR and FPS determine a data subset that maximizes the variance (CUR) or distribution (FPS) of the features or samples. These methods can be modified to combine supervised and unsupervised learning, in a formulation denoted PCov-CUR and PCov-FPS. For further reading, refer to [Imbalzano2018] and [Cersonsky2021].

These selectors can be used for both feature and sample selection, with similar instantiations. Currently, all sub-selection methods extend GreedySelector, where at each iteration the model scores each feature or sample (without an estimator) and chooses that with the maximum score."

https://scikit-matter.readthedocs.io/en/latest/selection.html

agoscinski commented 1 year ago

this is the current tutorials page scikit-matter-sample-selection-tutorial-page

I agree that it is written the API, but we had a user who wasn't sure from the examples how to use sample selection. So we can improve this, but changing an example or adding one.