haowulab / DSS

14 stars 13 forks source link

How can I use FDR as threshold for callDMR ? #2

Open bishwaG opened 6 years ago

bishwaG commented 6 years ago

Hi, I would like to use FDR reported by DMLtest() in callDMR() function as a threshold. I tried a naive way as follows Is that the right way to do?

DMLtest <- DMLtest[DMLtest$fdr <0.05, ]
DMRs <- callDMR(DMLtest,
    p.threshold=1,
    delta = 0.1)
haowulab commented 6 years ago

No you can't do that. It's fine if a DMR contains some CpG sites with significance not exceeding the threshold. By doing that you will remove lots of CpGs, which will cause trouble in calling CpG.

The multiple testing adjustment in DMR calling is a difficult question. Since the statistical test is performed per CpG, and then the significant CpGs are merged into regions, it is very difficult to estimate FDR for a region, based on the site level FDR. This is a question nobody knows how to answer for all types of genome-wide assays (including ChIP-seq). There are some tools report FDR based on some ad hoc procedure. However, I’d rather not to report a FDR if it’s not accurate.

bishwaG commented 6 years ago

@haowulab Thank you for the quick reply. I understand the difficulty of estimating FDR for regions. Since DMLtest() reports FDR, isn't it possible to filter DMLs based on FDR (rather than p-value) and call DMRs using those filtered DMLs?

I do not necessarily need FDR estimates for regions, but I would like to use significant DMLS (based on FDR) to call DMRs.

haowulab commented 6 years ago

Yes we can do that. However, after multiple testing correction, the FDRs for most sites are usually high (given the large number of tests), thus they don't have enough resolution. Using p-value as threshold allows users to set a looser threshold to obtain more regions.

bishwaG commented 6 years ago

You are right, with large number of hypothesis tests we get FDR closer to 1.

I still want to give a try fdr.threshold. How can I do in R using DSS? Do I have to modify the function callDMR() on line 24 and make it scores <- DMLresult$fdr?

callDMR <- function (DMLresult, delta = 0, p.threshold = 1e-05, minlen = 50, 
    minCG = 3, dis.merge = 100, pct.sig = 0.5) 
{
   .
   .
   .
    else {
        scores <- DMLresult$pval  ## LINE 24 ##########
    }
    dmrs <- findBumps(DMLresult$chr, DMLresult$pos, scores, cutoff = p.threshold, 
        sep = 5000, dis.merge = dis.merge, pct.sig = pct.sig, 
        minCG = minCG)
   .
   .
   .
}
haowulab commented 6 years ago

Yes you can do that.

bishwaG commented 6 years ago

Following is the modified callDMR function, but it did not give different result. It is because I have given delta>0 when calling callDMR(). So, it looks like I have to modify if block if (delta > 0){...}. What do I need to change in the if block?

## Modified p.threshold to fdr.threshold ---------------
callDMR <- function (DMLresult, delta = 0, fdr.threshold = 1e-05, minlen = 50, 
    minCG = 3, dis.merge = 100, pct.sig = 0.5) 
{
    ix.keep = !is.na(DMLresult$stat)
    if (mean(ix.keep) < 1) 
        DMLresult = DMLresult[ix.keep, ]
    flag.multifactor = FALSE
    if (class(DMLresult)[2] == "DMLtest.multiFactor") 
        flag.multifactor = TRUE
    if (dis.merge > minlen) 
        dis.merge = minlen
    if (delta > 0) {
        if (flag.multifactor) {
            stop("The test results is based on multifactor design, 'delta' is not supported")
        }
        p1 <- pnorm(DMLresult$diff - delta, sd = DMLresult$diff.se)
        p2 <- pnorm(DMLresult$diff + delta, sd = DMLresult$diff.se, 
            lower.tail = FALSE)
        postprob.overThreshold <- p1 + p2
        DMLresult <- data.frame(DMLresult, postprob.overThreshold = postprob.overThreshold)
        scores <- 1 - postprob.overThreshold
    }
    else {
        ## modified DMLresult$pval to DMLresult$fdr  ---------------
        scores <- DMLresult$fdr
    }
    dmrs <- findBumps(DMLresult$chr, DMLresult$pos, scores, cutoff = fdr.threshold, 
        sep = 5000, dis.merge = dis.merge, pct.sig = pct.sig, 
        minCG = minCG)
    if (is.null(dmrs)) {
        warning("No DMR found! Please use less stringent criteria. \n")
        return(NULL)
    }
    nCG <- dmrs[, "idx.end.global"] - dmrs[, "idx.start.global"] + 
        1
    ix.good <- dmrs$length > minlen & nCG > minCG
    if (sum(ix.good) == 0) {
        warning("No DMR found! Please use less stringent criteria. \n")
        return(NULL)
    }
    dmrs <- dmrs[ix.good, ]
    nCG <- dmrs[, "idx.end.global"] - dmrs[, "idx.start.global"] + 
        1
    if (flag.multifactor) {
        areaStat = rep(0, nrow(dmrs))
        for (i in 1:nrow(dmrs)) {
            ii = dmrs[i, "idx.start.global"]:dmrs[i, "idx.end.global"]
            areaStat[i] = sum(DMLresult[ii, "stat"])
        }
        result <- data.frame(dmrs[, 1:4], nCG = nCG, areaStat = areaStat)
    }
    else {
        meanMethy1 = meanMethy2 = areaStat = rep(0, nrow(dmrs))
        for (i in 1:nrow(dmrs)) {
            ii = dmrs[i, "idx.start.global"]:dmrs[i, "idx.end.global"]
            meanMethy1[i] = mean(DMLresult[ii, "mu1"])
            meanMethy2[i] = mean(DMLresult[ii, "mu2"])
            areaStat[i] = sum(DMLresult[ii, "stat"])
        }
        result <- data.frame(dmrs[, 1:4], nCG = nCG, meanMethy1, 
            meanMethy2, diff.Methy = meanMethy1 - meanMethy2, 
            areaStat = areaStat)
    }
    ix = sort(abs(result$areaStat), decreasing = TRUE, index.return = TRUE)$ix
    result[ix, ]
}
haowulab commented 6 years ago

Why don't you set delta=0 and try it first? If that doesn't return many DMR, using larger delta will have less.

bishwaG commented 5 years ago

Yes, I tried with delta=0. By the way, is smoothing recommended for RRBS data?