mahmoodlab / CLAM

Data-efficient and weakly supervised computational pathology on whole slide images - Nature Biomedical Engineering
http://clam.mahmoodlab.org
GNU General Public License v3.0
1.02k stars 340 forks source link

Use feature extraction files to train a ML model #179

Closed Tato14 closed 5 months ago

Tato14 commented 1 year ago

Hi, First of all, thank you for the great tool! I am looking for a way to use the features extracted with extract_features_fp.py. Ideally, I would like to use them together with some clinical data to train a mache learning classifier (random forest, xgboost or something like this).

However, since the number of features extracted is different between images, I am unsure how to reduce the number of features of the h5 files into something with a common shape for all images used. I was thinking of using PCA for this but I would like to know if you have any previous experience on this or you have a better idea.

I am also aware that you developed PORPOISE and I am planning to give it a try but the number of samples is rather small (~250) and maybe this method would need more samples to generalize.

KKIverson commented 1 year ago

some question

fedshyvana commented 5 months ago

I think for your purposes you might want to look into methods that produce slide-level embeddings directly such as GigaSSL, HIPT, etc.