Interface for better normalization in phyloseq

joey711 commented 11 years ago

Leaving it vague for now. Big updates will happen soon.

joey

joey711 commented 11 years ago

Done. This was added in version 1.7.3:

https://github.com/joey711/phyloseq/pull/253

This new function,

phyloseq_to_deseq2()

facilitates our recommended approach to using the Negative Binomial to model microbiome read count data, as a general alternative to sample proportions or rarefying, as we recently described in a draft article on the arxiv:

Waste Not, Want Not: Why Rarefying Microbiome Data is Inadmissible

This has been submitted for peer-review, along with four supplemental tutorials/simulations. We will host update-able versions of those among the phyloseq tutorials once the article has been accepted.

kfontanez commented 10 years ago

Joey-

I just read your ArXiv draft - very nice work. A few clarification questions about rarefying and it's implications.

My understanding of your work is that rarefying to an even sampling depth without replacement can be justified in some cases when you are comparing your samples using simple relative abundance in order to do clustering, ordination or multivariate analyses (e.g., non-parametric analysis of dissimilarities). As long as your sample sizes are large enough (effect sizes) after rarefying that you still capture the relevant clustering patterns - you should be good to go.

However, if you want to really test for differential abundance then you need to use your raw count data in a negative binomial model such as those implemented in DESeq or BaySeq. The raw count data is not rarefied nor normalized in any way prior to input into those programs because the negative bionomial models the count data appropriately to handle the uneven sampling depth.

Would you agree that those are two of the major conclusions of this work?

By the way, is there a reason you used the negative binomial as implemented in DeSeq as opposed to the negative binomial implemented in BaySeq?

Thanks, Kristina

kfontanez commented 10 years ago

Joey-

Another follow-up set of questions:

In your paper you suggest that repeated rarefying trials are better than a single rarefy. Is there a way to do this in the phyloseq package? I see that the rarefy_even_depth function can be used to rarefy - but how do you combine the repeated trials?

As an alternative to repeated rarefying you suggest transforming each count value to a common-scale using the minimum library size across your samples. For example, you have 1000 counts for OTU1 in sample A. There are three samples, A, B and C with library sizes of 80,000, 25,000 and 150,000, respectively. To transform the sample counts for OTU1 you would do the following calculation: 1000(25,000/80,000) = 313 (rounded). Is that correct?

Thanks, Kristina

joey711 commented 10 years ago

Kristina,

Thanks for the feedback. I'm addressing reviewer comments on this manuscript, and will try to fully respond to your two posts above as soon as I can. Basically, the implementation details will have to wait, and we are providing a lot of that with the final manuscript when it ultimately gets published.

I want to correct some points in your summary from the first post, because I don't want lingering misunderstanding about the conclusions of our manuscript (perhaps we were stating things to cautiously).

Rarefying should never be used. Ever. A statistician would never recommend this approach. It is not justifiable. I described rarefying as sometimes tolerable for whole-sample clustering, but I meant this in the ultra-cautious fair-to-the-point-of-insanity sense of the word tolerable -- that rarefying is not giving you the completely wrong answer, but it is very often (approaching always) giving you a worse, less-reliable answer, because you are throwing away data. I can describe rarefying this bluntly because we are here on my Issues Tracker, but we used a much more diplomatic description in the manuscript. I'm sorry that it appeared that rarefying should ever be used. Ever again. Please feel free to point out sections of the manuscript where this was less than clear, in case we have not caught this during the revisions.
Between an analysis framework for sample-wise comparisons, or testing differential abundance, the effect of rarefying is much worse for differential abundance. However, rarefying should not be used for either analysis framework.
The following is wrong as stated -- You wrote: "As long as your sample sizes are large enough (effect sizes) after rarefying that you still capture the relevant clustering patterns - you should be good to go". I think this comes down to a matter of terminology and understanding a few key statistical terms, but you'll miss an important piece of our conclusions if you don't get this straightened out, and it helps explain the misunderstanding that I tried to correct above. Library Size is not the same as Effect Size. Library Size is the number of reads per sample, with sample in this context effectively synonymous with library, except that other types of observations can be made on a microbial sample besides just nucleic acid sequencing; Effect Size in this context is a statistical term referring to the size of the difference between (1) the phenomena you are attempting to observe and (2) the relevant null hypothesis. In the case of unsupervised clustering of two groups of samples, Effect Size refers to how different the two groups are that you are formally comparing, but it is completely independent of the number of samples in the groups and independent of the number of reads in any of the samples. Numbers of observations affects how well you "see" effects of a given size, but has no bearing on the Effect Size itself. Consequently, you can have a large Effect Size but a small Library Size at the same time, and vice versa. Effect Size is something you usually do not know unless it is a rigged experiment or a simulation. Herein lies the problem. Since you don't know the Effect Size corresponding to a collection of new empirical data, you can't know if rarefying is "good to go". Of course, if the Effect Size is so obviously large that you get the same global answer even with one-tenth of the original data, then rarefying probably won't change that. But you may also miss more subtle things, and you will throw away samples that might have otherwise offered additional insight. Furthermore, if the Effect Size is actually that large, then there is no controversy. The discussion here is about performing more difficult statistical analyses where the answer might be subtle and isn't obvious ahead of time. Ocean microbiome samples are different than fecal samples. Fecal samples are different than soil samples. No argument. Any remotely unbiased normalization will still adequately represent these gross differences. But which bacterial strains are differentially abundant among those that are present at 0.1% or less average abundance? Is there a difference between these two somewhat similar clinical cohorts' fecal microbiota? You better not rarefy if you want to have a chance at answering these last two questions. Does that make sense?
The point about transforming to a common scale is merely that the transformation I provided is the expected value for each OTU count in each sample after many repeated rarefying trials. Using this transformation is better because it removes the noise that the random step in rarefying is artificially adding. We provide a simple demonstration of this added sampling noise among the supplemental files with the manuscript in preparation. That said, we're not actually advocating that anyone use this common scale transformation because it would still be equivalent to throwing away data. It is merely better than throwing away data AND adding sampling noise at the same time, which is what rarefying does.
I can't say anything good or bad about the implementation in the bayseq package, because I haven't tested it. Maybe can try it in a tutorial and compare with some of the others approaches. I'd be happy to hear feedback about how it works on microbiome data, how easy it is to use, etc.

Hope my answers help to clarify. Thank you very much for the feedback. It really helps a lot. Being able to consider your comments before we're finished making revisions for final publication is the main reason we wanted to post on the arxiv ahead of submission. It was my first time trying that, and I think it convinced me that it is was worthwhile. I will go back and try to re-read our latest draft with your comments in mind, and try to find where I can make some of these things more clear.

Thanks again for all your other feedback on the package, and best of luck with your research,

joey

kfontanez commented 10 years ago

Joey-

Thank you very much for your detailed comments and corrections of my interpretations. I know that it can be difficult to be diplomatic yet firm about conclusions in manuscripts especially when microbiome researchers are wearing rose-colored glasses as they read the manuscript!

It is indeed difficult to move away from rarefying as a standard procedure in these types of studies. At the moment, I find myself at that exact juncture where I need to decide how to model the count data when presenting comparisons of abundance across samples. Using BaySeq is quite easy to do and is able to detect differential abundance of OTUs across samples even for OTUs that are less than 0.1% of total reads in the dataset! (Funny that you used that exact number). It of course picks up the OTUs that are 5% or more of the dataset as well, but in my study I am interested in the differential abundance of those rare taxa as well.

One way the manuscript might be improved for an audience of microbiome researchers looking to follow your recommendations is to have a section in the discussion or Implications section where you explicitly state the steps they ought to take (akin to the way you described rarefying in the Introduction) starting from raw (untransformed, non-normalized OTU by count tables). Figure 2 is useful in understanding your simulations but that explicit enough to follow when trying to figure out how to proceed with your own data.

It may be that you have included that information in the supplementary sections, which I was not able to access on ArXiv. So, take the above suggestion with a grain of salt.

For my own part, I would be very interested in reading your tutorial about the DEseq implementation of modeling the count data. I tried finding it on Github and on your phyloseq tutorial pages but was only able to find a few sections of code describing it, though not in the easy to read R markdown format including the figures.

Thanks for your contributions towards improving statistical techniques in microbiome research!

Kristina

kfontanez commented 10 years ago

Joey-

Just a quick note to say that I was able to access the supplemental material on ArXiv and I'm going through it now. Would still love to see the DEseq tutorial though!

Kristina

joey711 commented 10 years ago

FYI, the supplemental material on the arxiv includes only the mathematical supplement, but none of our Rmd tutorials. We will release Rmd tutorials when the manuscript is accepted for publication in its final form. One exception is the DESeq2 tutorial that is already included as a vignette in the latest version of phyloseq on GitHub and BioC-devel.

Thanks for the additional feedback on the manuscript. We are definitely taking your comments into consideration.

joey

kfontanez commented 10 years ago

Joey-

I have a newbie R question- I can't access the vignette.

vignette("phyloseq-mixture-models", package="phyloseq") Warning message: vignette ‘phyloseq-mixture-models’ has no PDF/HTML

What's the rookie mistake here?

Kristina

joey711 commented 10 years ago

Actually this is a good question. I just noticed that the github-installed version does not have the vignettes built in the package. I imagine this is a developer-oriented design choice in the devtools::install_github function to speed up (re-)install time. I happen to have the BioC-devel version up to date, so you can download and re-install from there, and the vignettes will be pre-built. Not that there is on difference in the source code between the two, but the BioC servers provide pre-built "binary" versions for Mac OS and Windows. You could "build" the github version and get the same results, but it is more complicated (hint: you would use R CMD build phyloseq from the command line after downloading/unpacking the phyloseq repository)

Also, make sure you are using version 1.7.10+.

See the devel page for phyloseq on BioC: http://www.bioconductor.org/packages/2.14/bioc/html/phyloseq.html

However, I noticed that their daily-build system cleared version 1.7.10 as being built okay on Windows and Mac OS:

http://www.bioconductor.org/checkResults/devel/bioc-LATEST/phyloseq/moscato2-buildsrc.html

but they haven't yet provided the pre-built binaries for 1.7.10. So, only 1.7.9 is available in the links at the bottom. However, the vignettes are up to date in that version, and they also provide links to the vignettes on that page.

So short answer: view the vignette at the following link http://www.bioconductor.org/packages/2.14/bioc/vignettes/phyloseq/inst/doc/phyloseq-mixture-models.html

joey711 commented 10 years ago

The binaries are now version 1.7.10 and should include the pre-built vignettes. You can, however, still use the links above to see the vignettes if you'd like. Either way, they are available (and the source code Rmd for the vignettes was always available).

I'll close this issue for now unless/until something comes up.

Thanks again for the feedback!

joey

AndreaQ7 commented 7 years ago

Hi Joey, I'm very interest in this debate about the rarefying problem. Recently I have found another work on this subject titled "Effects of library size variance, sparsity, and compositionality on the analysis of microbiome data" and at least thay conclude saying "Rarefying yields fewer OTUs as significantly differentially abundant, but those OTUs are robust, in the sense that thay are almost always identified as significant by at least one other differencial abundance detection model"

Thus i'm bit confuse about which is the right test to use on microbiome case, could you give me any comments about this paper or well this phrase? thank you for the help

spholmes commented 7 years ago

Andrea I think that rarefying was used to overcome two problems, qiime and mothur are oversensitive to noise because there is no denoising component so rarefying can decrease this noise by looking at the data at a lower resolution (from afar so to speak).

Now that we are using a denoised pipeline (dada2) this does not make any sense, so one would not use rarefying any more if you are denoising.

The other reason to rarefy is to normalize the data and there are more powerful ways of doing this using transformations of the data whether through DEseq2 or ranking or other log type transformations. So transformations and more powerful procedures should be used for differential abundance testing;

however if you just want to look at unifrac distances you should still rarefy.

Hope this helps, Susan

On Thu, Feb 9, 2017 at 1:41 AM, AndreaQ7 notifications@github.com wrote:

Hi Joey, I'm very interest in this debate about the rarefying problem. Recently I have found another work on this subject titled "Effects of library size variance, sparsity, and compositionality on the analysis of microbiome data" and at least thay conclude saying "Rarefying yields fewer OTUs as significantly differentially abundant, but those OTUs are robust, in the sense that thay are almost always identified as significant by at least one other differencial abundance detection model"

Thus i'm bit confuse about which is the right test to use on microbiome case, could you give me any comments about this paper or well this phrase? thank you for the help

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/229#issuecomment-278592854, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvRmy7qgdNU719Mkq1PejkWMqL1oSks5rat86gaJpZM4A24E9 .

-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

AndreaQ7 commented 7 years ago

Thank you a lot!! Now this is much more clear to me. Still one question about this assumption: If the most powerful way to normalize data is DESeq2, so why use rarefying for the beta-diversity? I mean, why do not use the normalized dataset by DESeq2 to investigate even at the beta diversity? thank you a lot for your suggestions Susan, this helped me a lot in understanding

spholmes commented 7 years ago

It's very hard to use unifrac and diversity indicies at unequal depths one has the alternative though of using DPCoA and other affiliated methods that take the tree `into account'.

On Fri, Feb 10, 2017 at 1:23 AM, AndreaQ7 notifications@github.com wrote:

Thank you a lot!! Now this is much more clear to me. Still one question about this assumption: If the most powerful way to normalize data is DESeq2, so why use rarefying for the beta-diversity? I mean, why do not use the normalized dataset by DESeq2 to investigate even at the beta diversity? thank you a lot for your suggestions Susan, this helped me a lot in understanding

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/joey711/phyloseq/issues/229#issuecomment-278896065, or mute the thread https://github.com/notifications/unsubscribe-auth/ABJcvXWBTYawdFPInw1K11PnONE21LRiks5rbCyNgaJpZM4A24E9 .

-- Susan Holmes Professor, Statistics and BioX John Henry Samter Fellow in Undergraduate Education Sequoia Hall, 390 Serra Mall Stanford, CA 94305 http://www-stat.stanford.edu/~susan/

joey711 / phyloseq

Interface for better normalization in phyloseq #229