swo / caravan

a 16S pipeline
0 stars 3 forks source link

rarefaction across many samples #7

Closed elsherbini closed 8 years ago

elsherbini commented 8 years ago

often when comparing samples it's important to normalize the number of reads/sample somehow. The simplest and saddest way to do this is to define some threshhold, throw out all samples with fewer reads than that threshold, and sub-sample the other samples to have the same number of reads.

Is this a feature worth adding to caravan? Looks like QIIME has a python implementation: http://qiime.org/scripts/single_rarefaction.html (their github repo here: https://github.com/biocore/qiime/tree/master/scripts)

swo commented 8 years ago

I have a philosophical thing here: I only thought rarefying was good when you were comparing, say, species richness between samples.

http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003531

what are other situations when it's demonstrably important?

On Thu, May 19, 2016 at 10:49 PM Joseph Elsherbini notifications@github.com wrote:

often when comparing samples it's important to normalize the number of reads/sample somehow. The simplest and saddest way to do this is to define some threshhold, throw out all samples with fewer reads than that threshold, and sub-sample the other samples to have the same number of reads.

Is this a feature worth adding to caravan? Looks like QIIME has a python implementation: http://qiime.org/scripts/single_rarefaction.html (their github repo here: https://github.com/biocore/qiime/tree/master/scripts)

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/swo/caravan/issues/7

elsherbini commented 8 years ago

this paper is well written and very helpful. Thanks! I guess I don't need no, rarefying after all.