Open bishwaG opened 6 years ago
No you can't do that. It's fine if a DMR contains some CpG sites with significance not exceeding the threshold. By doing that you will remove lots of CpGs, which will cause trouble in calling CpG.
The multiple testing adjustment in DMR calling is a difficult question. Since the statistical test is performed per CpG, and then the significant CpGs are merged into regions, it is very difficult to estimate FDR for a region, based on the site level FDR. This is a question nobody knows how to answer for all types of genome-wide assays (including ChIP-seq). There are some tools report FDR based on some ad hoc procedure. However, I’d rather not to report a FDR if it’s not accurate.
@haowulab Thank you for the quick reply. I understand the difficulty of estimating FDR for regions. Since DMLtest()
reports FDR, isn't it possible to filter DMLs based on FDR (rather than p-value) and call DMRs using those filtered DMLs?
I do not necessarily need FDR estimates for regions, but I would like to use significant DMLS (based on FDR) to call DMRs.
Yes we can do that. However, after multiple testing correction, the FDRs for most sites are usually high (given the large number of tests), thus they don't have enough resolution. Using p-value as threshold allows users to set a looser threshold to obtain more regions.
You are right, with large number of hypothesis tests we get FDR closer to 1.
I still want to give a try fdr.threshold. How can I do in R using DSS? Do I have to modify the function callDMR()
on line 24 and make it scores <- DMLresult$fdr
?
callDMR <- function (DMLresult, delta = 0, p.threshold = 1e-05, minlen = 50,
minCG = 3, dis.merge = 100, pct.sig = 0.5)
{
.
.
.
else {
scores <- DMLresult$pval ## LINE 24 ##########
}
dmrs <- findBumps(DMLresult$chr, DMLresult$pos, scores, cutoff = p.threshold,
sep = 5000, dis.merge = dis.merge, pct.sig = pct.sig,
minCG = minCG)
.
.
.
}
Yes you can do that.
Following is the modified callDMR
function, but it did not give different result. It is because I have given delta>0
when calling callDMR()
. So, it looks like I have to modify if block if (delta > 0){...}
. What do I need to change in the if block?
## Modified p.threshold to fdr.threshold ---------------
callDMR <- function (DMLresult, delta = 0, fdr.threshold = 1e-05, minlen = 50,
minCG = 3, dis.merge = 100, pct.sig = 0.5)
{
ix.keep = !is.na(DMLresult$stat)
if (mean(ix.keep) < 1)
DMLresult = DMLresult[ix.keep, ]
flag.multifactor = FALSE
if (class(DMLresult)[2] == "DMLtest.multiFactor")
flag.multifactor = TRUE
if (dis.merge > minlen)
dis.merge = minlen
if (delta > 0) {
if (flag.multifactor) {
stop("The test results is based on multifactor design, 'delta' is not supported")
}
p1 <- pnorm(DMLresult$diff - delta, sd = DMLresult$diff.se)
p2 <- pnorm(DMLresult$diff + delta, sd = DMLresult$diff.se,
lower.tail = FALSE)
postprob.overThreshold <- p1 + p2
DMLresult <- data.frame(DMLresult, postprob.overThreshold = postprob.overThreshold)
scores <- 1 - postprob.overThreshold
}
else {
## modified DMLresult$pval to DMLresult$fdr ---------------
scores <- DMLresult$fdr
}
dmrs <- findBumps(DMLresult$chr, DMLresult$pos, scores, cutoff = fdr.threshold,
sep = 5000, dis.merge = dis.merge, pct.sig = pct.sig,
minCG = minCG)
if (is.null(dmrs)) {
warning("No DMR found! Please use less stringent criteria. \n")
return(NULL)
}
nCG <- dmrs[, "idx.end.global"] - dmrs[, "idx.start.global"] +
1
ix.good <- dmrs$length > minlen & nCG > minCG
if (sum(ix.good) == 0) {
warning("No DMR found! Please use less stringent criteria. \n")
return(NULL)
}
dmrs <- dmrs[ix.good, ]
nCG <- dmrs[, "idx.end.global"] - dmrs[, "idx.start.global"] +
1
if (flag.multifactor) {
areaStat = rep(0, nrow(dmrs))
for (i in 1:nrow(dmrs)) {
ii = dmrs[i, "idx.start.global"]:dmrs[i, "idx.end.global"]
areaStat[i] = sum(DMLresult[ii, "stat"])
}
result <- data.frame(dmrs[, 1:4], nCG = nCG, areaStat = areaStat)
}
else {
meanMethy1 = meanMethy2 = areaStat = rep(0, nrow(dmrs))
for (i in 1:nrow(dmrs)) {
ii = dmrs[i, "idx.start.global"]:dmrs[i, "idx.end.global"]
meanMethy1[i] = mean(DMLresult[ii, "mu1"])
meanMethy2[i] = mean(DMLresult[ii, "mu2"])
areaStat[i] = sum(DMLresult[ii, "stat"])
}
result <- data.frame(dmrs[, 1:4], nCG = nCG, meanMethy1,
meanMethy2, diff.Methy = meanMethy1 - meanMethy2,
areaStat = areaStat)
}
ix = sort(abs(result$areaStat), decreasing = TRUE, index.return = TRUE)$ix
result[ix, ]
}
Why don't you set delta=0 and try it first? If that doesn't return many DMR, using larger delta will have less.
Yes, I tried with delta=0. By the way, is smoothing recommended for RRBS data?
Hi, I would like to use
FDR
reported byDMLtest()
incallDMR()
function as a threshold. I tried a naive way as follows Is that the right way to do?