snayfach / MicrobeCensus

MicrobeCensus estimates the average genome size of microbial communities from metagenomic data
http://genomebiology.com/2015/16/1/51
GNU General Public License v3.0
41 stars 16 forks source link

differences with other normalization methods #31

Closed sapuizait closed 1 year ago

sapuizait commented 1 year ago

Hi again

Thank you for your wonderful software. I love the fact that I am able to normalize using the genome nr estimate of each sample. Out of curiosity, I tried to compare the microbe census results (from a group of 50 samples) to a previous normalization attempt using the sum of 16S copies in the same samples. However, the two normalization methods did not agree at all. Would you say that this is mostly because 16S is a multicopy gene while microbecensus uses single copy genes? Just wanted to hear your take in this

Thanks

snayfach commented 1 year ago

Hello - there are a number of reasons why the two might disagree, including that 16S are not single copied. How are the 16S copies estimated? If you are using reference mapping, then you could be missing some 16S. If you are using assembly, it is known that 16S is v. hard to assemble. If you are doing amplicon sequencing, then you can't compare vs the bulk metagenome.

sapuizait commented 1 year ago

good point, yes reference mapping. In the past (before switching to microbecensus) I would use the silva db as reference and estimate the counts of shotgun sequencing reads per sample. So yes i guess there is some loss (from the taxa missing) and the multiple copy gene that bias the results a lot! Thanks a lot for your quick reply!