zellerlab / siamcat

R package for Statistical Inference of Associations between Microbial Communities And host phenoType
https://siamcat.embl.de/
52 stars 16 forks source link

Question about normalization #17

Closed choon-sim closed 3 years ago

choon-sim commented 3 years ago

Hi, I have a question about how SIAMCAT does normalization.

My input is a table of relative abundances of predicted proteins. The table is a bit special because I am changing the relative abundances from being normalized by (the number of classified reads) to (the total number of reads). That means that the relative abundances would not sum to 1 since some reads are not classified. I wanted to ask to make sure that this doesn't violate an assumption of the differential abundance calculation SIAMCAT is using. It might assume they sum to 1 or renormalize if they don't. I am asking for both the association analysis and the machine learning normalization/ prediction in SIAMCAT.

jakob-wirbel commented 3 years ago

Hi Choon-Sim, it seems to me that your column sums would then be smaller than 1, since you do not include the fraction of unclassified reads in your table, is that correct? In this case, SIAMCAT should work without a problem (also the visualization). In fact, we often remove the fraction of unclassified reads also for mOTUs2-profiles within the filter.features function.

So, in short, it should still work :)

You can let me know here wether you run into any problems or not Cheers, Jakob

choon-sim commented 3 years ago

Hi Jacob, yes exactly, the column sums are smaller than 1. Great to know that it should still work. I have no problem running it. Thanks!