zellerlab / siamcat

R package for Statistical Inference of Associations between Microbial Communities And host phenoType
https://siamcat.embl.de/
51 stars 16 forks source link

extracting list of plotted features #18

Closed adamsorbie closed 3 years ago

adamsorbie commented 3 years ago

Is it possible to extract a list of the features plotted by the model.interpration.plot function? Am I correct in thinking the plotted features are ordered by effect size? I tried extracting the feature weights dataframe and subsetting but I can't see anything that looks like effect size in there that would let me do this.

jakob-wirbel commented 3 years ago

Hi @adamsorbie

Yes, you can have a look at the features selected in the machine learning workflow using the feature_weights function.

This function will return a dataframe that contains some info about each feature, for example the mean or median feature weight in the machine learning model. The last column, called percentage indicates the fraction of models in which this feature has been selected (across cross-validation folds), which can be important for LASSO models for example. In the model.interpretation.plot function, features are usually filtered for consistency (in what percentage of models has this feature been selected?, the parameter in the function is called consens.thres and the default is 0.5) and are then ordered based on the median relative feature weight across models (the column name is median.rel.weight). So this column would be the equivalent of an effect size that you are searching for, i guess.

Hope this helps :)

adamsorbie commented 3 years ago

Yeah that's exactly what I needed, thanks a lot! :)