zellerlab / siamcat

R package for Statistical Inference of Associations between Microbial Communities And host phenoType
https://siamcat.embl.de/
51 stars 16 forks source link

SIAMCAT v2.2.0 model.interpretation.plot is not working #38

Closed Ales-ibt closed 1 year ago

Ales-ibt commented 1 year ago

Hello there!

Thank you for developing SIAMCAT. I'm preparing a Comparative Metagenomics notebook for MGnify using MGnifyR, phyloseq and SIAMCAT. The first version using SIAMCAT v1.5.1 ran perfectly fine. Now we had to change SIAMCAT to v2.2.0 of Bioconductor to make it compatible with other dependencies and the R version. The problem is the model.interpretation.plot step. Depending on the training method, this step either generates an empty file or throws the error: Not enough features were selected for plotting!.

The Jupyter Notebook and the environment we set up in a Docker image are available in our notebooks public repo.

Thanks in advance!

Alejandra

jakob-wirbel commented 1 year ago

Hi Alejandra Thanks for using SIAMCAT in your educational material! :) Also, Kudos for making some super nice notebooks. As far as I understand it, you get a pretty good model with a single feature already. When I check the feature weights (head(feature_weights(siamcat_obj))), I find a single feature to be selected by the lasso model only. The plot doesn't really work so well with a single feature, that's why I added this error. Alternatively, you could train a different model with less strict regularization (maybe the lasso_ll or the randomForest model), which should give you multiple features in your final plot. Please let me know if that works :) Cheers, Jakob

Ales-ibt commented 1 year ago

Hello Jakob,

Thank you for your reply. Unfortunately, no matter which model I used the final step continued to fail. The intriguing thing is that SIAMCAT1 could find features while SIAMCAT2 is failing. Improving the methods is playing against me :D. Now that I know there's no bug in SIAMCAT, I looked at my dataset from scratch and found some samples with weird extreme data in the metadata sheet. After filtering them out, I can generate the model and discover some features successfully. For some reason, looks like SIAMCAT1 tolerates better the noise.

Thanks again for developing SIAMCAT!

Bests!