jtamames / SqueezeMeta

A complete pipeline for metagenomic analysis
GNU General Public License v3.0
346 stars 81 forks source link

Should unclassified and unmapped abundances be taken into account during gene abundance normalization? #748

Closed drbmanna closed 7 months ago

drbmanna commented 8 months ago

Hi,

I am currently applying CSS normalization to the KO raw abundance data. Since there is variability in the total counts of unclassified and unmapped reads across different samples, their presence is influencing the results, whether they are retained or excluded from the dataset. I am seeking expert guidance regarding whether it is advisable to retain or exclude them before the normalization process.

Thank you for your assistance. BM

fpusan commented 7 months ago

I am not fully sure myself, it may be better to ask this to the authors of CSS normalization as I am not familiar with the method. My first instinct would be to keep them, since CSS normalization is meant to correct differences in library size, and unmapped/unclassified reads are technically part of your library.