Closed kfontanez closed 10 years ago
Kristina,
These are good questions. I will try to address them soon.
joey
Kristina,
The article has been accepted in PLoS Computational Biology and is we are currently working with the production staff. The snapshot of source files for the simulations and outputs will be published as a compressed supplemental file accompanying the article. I will likely also post a separate GitHub repo with the files and tutorials... and this will most-likely be associated with the arXiv version of the article, which we may update with a few extra results that did not make it into the upcoming article.
Question 1. -- I will need to go back and check, but I expect that I used the default independent filtering and not Cook's outlier detection. Now that you mention it, I would like to try both, so I will add this to the to-do list for the github repo version
Questrion 2 -- Again, I will need to go back and check, but expect that I tried only vst
. Another item for the to-do list.
Figures for the publication need to be clear and concise as much as possible, but I can add many additional comparisons/figures to the github tutorials. The lingering issue motivating your questions, though, is whether one option is better than the other for your data. I plan to look into it for various datasets so that I can say something useful about it in the tutorials.
Joey-
Congratulations on the acceptance of your article.
Since I posted this question I ended up choosing not to use independent filtering and Cook's outlier detection for my metagenomic data.
Since the treatments I am using are resulting in vastly different microbial communities, using Cook's outlier detection resulted in the loss of some of my most differentially abundant taxa. The outlier detection makes sense for transcriptomics where you don't expect too many changes from one treatment to the next, but for metagenomics when you might expect large swings in taxonomic diversity, it just doesn't make sense to me.
The independent filtering seemed like a good idea in principal but I found it easier to do by hand in my excel spreadsheets after that fact. That way, I could see exactly what taxa I was losing at each cutoff.
As for the variance-stabilizing transformation, I actually settled on the regularized log-transformation. In part, this decision was motivated by discussions I had on the Bioconductor list with the authors of DESeq regarding the various merits of the transformations. It also made sense to use the rlog transformation because it is very similar to the transformation used during the nbinomWald test that is used to test for differential abundance.
My final approach consisted of choosing a study design in DESeq2, conducting the nbinomWald test using that study design to test for differential abundance among my samples, exporting rlog transformed count data which took into account the chosen study design, and using that data to make relative abundance heatmaps/bar charts in phyloseq. I actually exported the log2 fold change results from the nbinomWald test as well and was able to make some gorgeous heatmaps showing log fold changes among my samples in phyloseq.
So, when testing the various transformations I would suggest that you include example datasets with large swings in taxonomic diversity as well as those with more subtle changes.
Kristina
Great! So this issue is now closed.
I may be jumping the gun on this one a little bit and these questions may be answerable once you release the vignettes and Rmarkdowns associated with your "Waste not, want not" paper. If so, I look forward to taking a look at them!
After reading the latest version of the paper (version 2) and looking at the vignette available in the latest phyloseq development version using the phyloseq_to_deseq2 and DEseq functions - I have two questions.
Thanks for your insights!
Kristina