zellerlab / siamcat

R package for Statistical Inference of Associations between Microbial Communities And host phenoType
https://siamcat.embl.de/
51 stars 16 forks source link

model interpretation plot error #29

Closed gpmoran closed 1 year ago

gpmoran commented 2 years ago

Hi, I have encountered a problem with the model interpretation plot. My feature weight plot is completely one-sided and I get the following warning: Warning message: In model.interpretation.select.features(feature.weights = feature.weights, : WARNING: restricting amount of features to be plotted to 50

However, If I run a lasso model on the data the resulting plot has both positive and negative features. For the random forest interpretation plot I run:

model.interpretation.plot(

gnsrfinterpretation.pdf

jakob-wirbel commented 2 years ago

Hi @gpmoran thanks for the input! There seems to be an issue with the randomforest interpretation plot, which results in all the features being one-sided. Thanks for finding this issue!

gpmoran commented 2 years ago

Hi Jakob, Thanks for the response, can this bug be fixed any time soon?

Gary

jakob-wirbel commented 2 years ago

I fixed some typos in the script for the model interpretation plot; I hope this took care of your problem :)

you can try out the github version of the package by installing it via devtools

devtools::install_github("zellerlab/siamcat")
gpmoran commented 2 years ago

Hi Jakob, Thank you for looking into this. I regenerated my Siamcat object using the github version (1.13.4) and have run through my model again. The warning message about feature weights is now gone. However, my model interpretation plot (below) is still showing one sided feature weights and the metagenomic features are not sorted by Z score in the heatmap? gnsrfinterpretation.pdf

jakob-wirbel commented 2 years ago

Hi @gpmoran In short, the code worked as intended, but we could think about changing the plot layout for the random forest plot :D In the interpretation plot, the features are not ordered by z-score, but rather by model importance. For lasso/ridge/enet, this is the model coefficient, whereas for the random forest, this is the Gini coefficient. However, the direction (enrichment in cases or in controls) is not encoded in the Gini coefficient (compared to the model coefficients from lasso or enet), therefore we compute post-hoc in which group the feature is more enriched, using the AUROC of the feature to score them. In the plot, the direction of enrichment is shown by the colour of the feature name.

gpmoran commented 2 years ago

Hi Jakob, Thanks for the explanation. That works for me! Gary