Closed zkstewart closed 2 years ago
After going through a bit of a process, I have been able to normalise my own data and, I believe, the example data. Both cases give different counts, so I have to conclude that metaseqR2 is doing something wrong.
I ran the following to get the example data normalised
dataMatrix <- metaseqR2:::exampleCountData(2000)
lengths <- round(1000*runif(nrow(dataMatrix)))
gc=runif(nrow(dataMatrix))
test.exp.set <- newSeqExpressionSet(dataMatrix,
phenoData = data.frame(
conditions = factor(c("A", "A", "B", "B", "B")),
row.names = colnames(dataMatrix)
),
featureData = data.frame(
gc = gc,
length = lengths,
row.names = rownames(dataMatrix)
)
)
test.within <- withinLaneNormalization(test.exp.set,"gc", which="full")
test.norm <- betweenLaneNormalization(test.within, which="full")
test.norm.counts.eda = normCounts(test.norm)
I'd expect normalizeEdaseq to provide a more-or-less equivalent result. Does anyone know why this isn't the case?
Hi @zkstewart, Thanks for reporting, I will look at it.
Hello @zkstewart,
I have looked into your report and I noticed that in your example you are using full
method for withinLaneNormalization
.
metaseqR2 on the other hand uses loess
which is the default method of the EDASeq function.
Changing loess
to full
in normalizeEdaseq
returns the same results as with your example (and vice versa).
# =================
# zkstewart example
# =================
set.seed(21)
dataMatrix <- metaseqR2:::exampleCountData(2000)
lengths <- round(1000*runif(nrow(dataMatrix)))
gc=runif(nrow(dataMatrix))
test.exp.set <- newSeqExpressionSet(dataMatrix,
phenoData = data.frame(
conditions = factor(c("A", "A", "B", "B", "B")),
row.names = colnames(dataMatrix)
),
featureData = data.frame(
gc = gc,
length = lengths,
row.names = rownames(dataMatrix)
)
)
test.within <- withinLaneNormalization(test.exp.set,"gc", which="full")
test.norm <- betweenLaneNormalization(test.within, which="full")
test.norm.counts.eda = normCounts(test.norm)
head(test.norm.counts.eda)
## head(test.norm.counts.eda)
## A1 A2 B1 B2 B3
## gene_1_T 202 284 94 91 188
## gene_2_F 98 150 150 133 144
## gene_3_T 128 194 93 64 89
## gene_4_F 418 202 440 716 251
## gene_5_F 26 26 2 13 17
## gene_6_F 320 425 246 284 418
#======================================
# From normalizeEDASeq
#======================================
geneData <- as.data.frame(dataMatrix)
geneCounts <- dataMatrix
classes <- factor(c("A", "A", "B", "B", "B"))
geneData$gc_content <- gc
seqGenes <- newSeqExpressionSet(
geneCounts,
phenoData=AnnotatedDataFrame(
data.frame(
conditions=classes,
row.names=colnames(geneCounts)
)
),
featureData=AnnotatedDataFrame(
data.frame(
gc=geneData$gc_content,
length=lengths,
row.names=if (is.data.frame(geneData)) rownames(geneData)
else names(geneData)
)
)
)
# ---------------------------------------
# With the metaseqR2 defaults
# ---------------------------------------
normArgs <- metaseqR2:::getDefaults("normalization", "edaseq")
normArgs
## $within.which
## [1] "loess"
#
## $between.which
## [1] "full"
seqGenes <- betweenLaneNormalization(withinLaneNormalization(seqGenes,
"gc",which=normArgs$within.which),which=normArgs$between.which)
head(normCounts(seqGenes))
## A1 A2 B1 B2 B3
## gene_1_T 212 277 87 102 180
## gene_2_F 125 204 199 178 189
## gene_3_T 124 187 85 70 91
## gene_4_F 549 231 548 865 302
## gene_5_F 33 33 5 15 18
## gene_6_F 339 460 274 305 408
# ----------------------------------
# Using full instead of loess
# ----------------------------------
normArgs$within.which <- "full"
seqGenes <- betweenLaneNormalization(withinLaneNormalization(seqGenes,
"gc",which=normArgs$within.which),which=normArgs$between.which)
head(normCounts(seqGenes))
## A1 A2 B1 B2 B3
## gene_1_T 202 284 94 91 188
## gene_2_F 98 150 150 133 144
## gene_3_T 128 194 93 64 89
## gene_4_F 418 202 440 716 251
## gene_5_F 26 26 2 13 17
## gene_6_F 320 425 246 284 418
Hope this addresses the issue. Best, Dionysis
From what I can tell the script you've provided doesn't trigger the problem that I encountered since you're still calling the EDASeq methods directly. The problem arises when we use the metaseqR2 methods. The example script below demonstrates this.
# my testing
library(metaseqR2)
library(EDASeq)
set.seed(21)
# =================
# setup example data
# =================
dataMatrix <- metaseqR2:::exampleCountData(2000)
lengths <- round(1000*runif(nrow(dataMatrix)))
gc=runif(nrow(dataMatrix))
test.exp.set <- newSeqExpressionSet(dataMatrix,
phenoData = data.frame(
conditions = factor(c("A", "A", "B", "B", "B")),
row.names = colnames(dataMatrix)
),
featureData = data.frame(
gc = gc,
length = lengths,
row.names = rownames(dataMatrix)
)
)
head(dataMatrix)
## head(dataMatrix)
## A1 A2 B1 B2 B3
## gene_1_T 169 288 53 81 246
## gene_2_F 100 215 115 138 275
## gene_3_T 100 193 52 50 124
## gene_4_F 449 254 315 648 396
## gene_5_F 29 39 3 12 28
## gene_6_F 304 529 172 237 590
# =================
# full norm with EDASeq directly
# =================
test.within <- withinLaneNormalization(test.exp.set,"gc", which="full")
test.norm <- betweenLaneNormalization(test.within, which="full")
test.norm.counts.eda = normCounts(test.norm)
head(test.norm.counts.eda)
## head(test.norm.counts.eda)
## A1 A2 B1 B2 B3
## gene_1_T 202 284 94 91 188
## gene_2_F 98 150 150 133 144
## gene_3_T 128 194 93 64 89
## gene_4_F 418 202 440 716 251
## gene_5_F 26 26 2 13 17
## gene_6_F 320 425 246 284 418
# =================
# loess norm with EDASeq directly
# =================
test.within.loess <- withinLaneNormalization(test.exp.set,"gc", which="loess")
test.norm.loess <- betweenLaneNormalization(test.within.loess, which="full")
test.norm.counts.eda.loess = normCounts(test.norm.loess)
head(test.norm.counts.eda.loess)
## head(test.norm.counts.eda.loess)
## A1 A2 B1 B2 B3
## gene_1_T 212 277 87 102 180
## gene_2_F 125 204 199 178 189
## gene_3_T 124 187 85 70 91
## gene_4_F 549 231 548 865 302
## gene_5_F 33 33 5 15 18
## gene_6_F 339 460 274 305 408
# =================
# loess norm through metaseqR2
# =================
sampleList <- list(A=c("A1","A2"),B=c("B1","B2","B3"))
geneData <- data.frame(
chromosome=c(rep("chr1",nrow(dataMatrix)/2),
rep("chr2",nrow(dataMatrix)/2)),
start=rep(1, nrow(dataMatrix)),
end=lengths,
gene_id=rownames(dataMatrix),
gc_content=gc,
row.names=rownames(dataMatrix)
)
normArgs <- metaseqR2:::getDefaults("normalization", "edaseq")
test.norm.counts.metar2.loess = normalizeEdaseq(dataMatrix, sampleList, geneData=geneData, normArgs=normArgs)
head(test.norm.counts.metar2.loess)
## head(test.norm.counts.metar2.loess)
## A1 A2 B1 B2 B3
## gene_1_T 169 288 53 81 246
## gene_2_F 100 215 115 138 275
## gene_3_T 100 193 52 50 124
## gene_4_F 449 254 315 648 396
## gene_5_F 29 39 3 12 28
## gene_6_F 304 529 172 237 590
# =================
# full norm through metaseqR2
# =================
normArgs$within.which = "full"
test.norm.counts.metar2.full = normalizeEdaseq(dataMatrix, sampleList, geneData=geneData, normArgs=normArgs)
head(test.norm.counts.metar2.full)
## head(test.norm.counts.metar2.full)
## A1 A2 B1 B2 B3
## gene_1_T 169 288 53 81 246
## gene_2_F 100 215 115 138 275
## gene_3_T 100 193 52 50 124
## gene_4_F 449 254 315 648 396
## gene_5_F 29 39 3 12 28
## gene_6_F 304 529 172 237 590
Regardless of whether normArgs is configured to use loess or full, the resulting counts table is identical to the original i.e., no normalisation has occurred to it. To the best of my knowledge I emulated the expression set correctly for use by metaseqR2 as per what I see here https://rdrr.io/bioc/metaseqR2/man/normalizeEdaseq.html
.
Are you able to get the metaseqR2 method normalizeEdaseq
to provide normalised output that is the same as when we call EDASeq directly through withinLaneNormalization
and betweenLaneNormalization
?
Thanks, Zac.
Digging through the source code, I believe I know what the issue is. In the file R/norm.R
on line 63 (https://github.com/pmoulos/metaseqR2/blob/34a32008a0db34a6736d1ce3f1fd3afb404df3d2/R/norm.R#L63
) the code reads:
return(counts(seqGenes)) # Class: matrix
It should instead use the normCounts
method like we are above, and hence read as such:
return(normCounts(seqGenes)) # Class: matrix
Hi @zkstewart, The last commit should fix the issue. Thanks for pointing out.
I'm attempting to run a number of normalisation methods implemented in metaseqR2, but am having issues using EDASeq normalisation.
With my own data, or following the example given with the help message (?normalizeEdaseq), the resulting data matrix has no changes from the input counts table.
I initially installed metaseqR2 from Bioconductor, and then installed from Github just in case it'd been fixed in a more recent commit. Neither version works.
I will try to run EDASeq independent of metaseqR2 and see what happens.