spholmes / F1000_workflow

43 stars 33 forks source link

DESeq2 variance stabilizing transformation #20

Open csmiguel opened 6 years ago

csmiguel commented 6 years ago

In DESeq2, after applying getVarianceStabilizedData, all counts = 0 are transformed to negative values. These correspond to 2/3 of the values in the matrix. I am worried that this will interfere with downstream Hierarchical FDR analysis.

  1. Is it "normal" to get negative normalized values for counts = 0?
  2. Can I proceed with Hierarchical multiple testing with structSSI using this transformed counts?
  3. Could I just go with log transformed counts?
  4. Should I apply other transformation?

Thanks Susan for publishing this helpful guide.

krisrs1128 commented 6 years ago

In some analysis, having zeros turn into nonzero values can be problematic. For example, this would cause problems for any procedures expecting some sort of zero-inflation. However, for the hierarchical testing procedure, you really only need to be concerned about whether your original tree-wide p-values are valid. Since the treePValues function is performing t or F tests, you will be okay if you have enough samples and the transformed data aren’t too skewed -- this will guarantee that the central limit theorem kicks in for the averages. Alternatively, you could use a nonparametric test. So, briefly,

  1. Yes, this is expected, even in the original RNA-seq analysis for which the variance stabilization was designed.
  2. As long as the individual tests for each ASV is valid, the tree testing procedure will behave as expected. 3 + 4. The best transformation is the one that gives your individual tests the most power. You can get some intuition for this by looking at the histograms of the transformed counts -- the less skewed, the better.