zellerlab / siamcat

R package for Statistical Inference of Associations between Microbial Communities And host phenoType
https://siamcat.embl.de/
52 stars 16 forks source link

integrate biomass into siamcat #46

Closed tapj closed 7 months ago

tapj commented 7 months ago

Hi,

I have biomass data (or total cell count per sample), How can I train a model integrating this biomass parameter.

I do not want biomass as confounding factor but as feature predictor.

++ Ju.

jakob-wirbel commented 7 months ago

Hi @tapj Hope all is well with you! :)

This should not be a big problem. If you have the biomass parameter in your metadata, you can use the add.meta.pred function to add it as an additional predictor to the dataframe of bacterial abundances. It will be standardized by default, but you can turn this off, I think. So the best would be to add it to the feature data after the data normalization step and then train the model. If it is important for the prediction of the outcome, it will be selected by the Lasso or the Random Forest classifier.

Let me know if this works!

tapj commented 7 months ago

Thank you Jakob for your fast reply, I haven't noticed this function in siamcat.