ruggleslab / jukebox

5 stars 6 forks source link

Potential bug in FDR correction #11

Closed gavinmdouglas closed 4 years ago

gavinmdouglas commented 4 years ago

Hi there,

I noticed that the FDR-correction in the aldex.ttest2 function is performed for each feature individually (as far as I can tell) rather than across all features. Specifically, I'm talking about this code (starting at line 911 in global.R):

for (mc.i in 1:mc.instances) {
    numTicks <- progress(mc.i, mc.instances, numTicks)
    t.input <- sapply(mc.all, function(y) {
      y[, mc.i]
    })
    for (j in 1:nrow(t.input)){
      wi.p.matrix[j, mc.i] <- wilcox.test(t.input[j,1:length(setA)], t.input[j,c(length(setA)+1):c(length(setA)+length(setB))], 
                                          paired = paired.test)$p.value

      wi.BH.matrix[j, mc.i] <- p.adjust(wi.p.matrix[j, mc.i], 
                                        method = "BH")

      #we.p.matrix[, mc.i] <- t.fast(t.input, setAsBinary, paired.test)

      we.p.matrix[j, mc.i] <- t.test(t.input[j,1:length(setA)], t.input[j,c(length(setA)+1):c(length(setA)+length(setB))], 
                                     paired = paired.test)$p.value

      we.BH.matrix[j, mc.i] <- p.adjust(we.p.matrix[j, mc.i], 
                                        method = "BH")
    }
  }

I would have expected the multiple-test correction to be done like in the below code:

for (mc.i in 1:mc.instances) {
  numTicks <- progress(mc.i, mc.instances, numTicks)
  t.input <- sapply(mc.all, function(y) {
    y[, mc.i]
  })
  for (j in 1:nrow(t.input)){
    wi.p.matrix[j, mc.i] <- wilcox.test(t.input[j,1:length(setA)], t.input[j,c(length(setA)+1):c(length(setA)+length(setB))], 
                                        paired = paired.test)$p.value

    #we.p.matrix[, mc.i] <- t.fast(t.input, setAsBinary, paired.test)

    we.p.matrix[j, mc.i] <- t.test(t.input[j,1:length(setA)], t.input[j,c(length(setA)+1):c(length(setA)+length(setB))], 
                                   paired = paired.test)$p.value
  }

  wi.BH.matrix[, mc.i] <- p.adjust(wi.p.matrix[, mc.i], method = "BH")
  we.BH.matrix[, mc.i] <- p.adjust(we.p.matrix[, mc.i], method = "BH")
}

I might be missing something, but I thought I would mention this in case it is a bug.

Thanks,

Gavin

jcooperdevlin commented 4 years ago

Hi Gavin,

Thanks for your comment!

I've updated the source code to reflect this change. I did some testing and didn't see much of an empirical difference in terms of significantly associated taxa or pathways with our default example.

Let me know if you notice any significant effects due to this update.

Thanks again!

gavinmdouglas commented 4 years ago

No problem!

gavinmdouglas commented 4 years ago

Just to follow up - it looks like running FDR-correction with a single P-value just leaves the P-value unchanged, so the aldex.ttest2 output columns corresponding to FDR-corrected P-values would have actually just been the raw P-values I believe.