waldronlab / MultiAssayExperiment

Bioconductor package for management of multi-assay data
https://waldronlab.io/MultiAssayExperiment/
69 stars 32 forks source link

mergeReplicates not working for SummarisedExperiment and ExpressionSet #217

Closed jonaszierer closed 7 years ago

jonaszierer commented 7 years ago

While playing around with the TCGA data I noticed that mergeReplicates doesn't work if either a SummarisedExperiment or an ExpressionSet is in the MSE. Here a quick example (that is largely copy paste from the Documentation)

library(MultiAssayExperiment)
library(SummarizedExperiment)
nrows <- 5
ncols <- 4
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
rowRanges <- GRanges(rep(c("chr1", "chr2"), c(2, nrows - 2)),
                     IRanges(floor(runif(nrows, 1e5, 1e6)), width=100),
                     strand=sample(c("+", "-"), nrows, TRUE),
                     feature_id=sprintf("ID\\%03d", 1:nrows))
names(rowRanges) <- letters[1:5]
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 2),
                     row.names= c("mysnparray1", "mysnparray2",
                                  "mysnparray3", "mysnparray4"))
rse <- SummarizedExperiment(assays=SimpleList(counts=counts),
                            rowRanges=rowRanges, colData=colData)
rse2 <- counts
rownames(rse2) <- paste0("gene", 1:nrow(counts))
colnames(rse2) <- rownames(colData)
rse2 <- ExpressionSet(assayData=rse2)
(rangemap <-
   data.frame(primary = c("Jack", "Jack", "Bob", "Barbara"),
              assay = c("mysnparray1", "mysnparray2", "mysnparray3",
                        "mysnparray4"), stringsAsFactors = FALSE))
dfmap <- listToMap(list(cnv=rangemap))
patient.data <- data.frame(sex=c("M", "F", "M", "F"),
    age=38:41,
    row.names=c("Jack", "Jill", "Bob", "Barbara"))
myMultiAssay  <- MultiAssayExperiment(list(cnv = rse), patient.data, dfmap)
myMultiAssay2 <- MultiAssayExperiment(list(cnv = rse2), patient.data, dfmap)
##
mergeReplicates(myMultiAssay)
mergeReplicates(myMultiAssay2)
LiNk-NY commented 7 years ago

Hi @jonaszierer Thanks for reporting this. I will look into it. Regards, Marcel

jonaszierer commented 7 years ago

For ExpressionSet the problem is the validate=F in the mergeReplicates. The resulting expression set is invalid and its dimnames do not correspond to the dimnames of the expression matrix. This causes subsequent problems in .harmonize. Removing samples from the expressionSet prior to replacing the data with the merged data resolved the problem.

adding object <- object[, colnames(object) %in% colnames(result) ] in the else if (is(object, "ExpressionSet")) branch (line 134 in MultiAssayExperiment-helpers.R) should do the job. After that validate should be set to TRUE

lwaldron commented 7 years ago

I doubt that merging can be accomplished in general for supported ExperimentList elements. However, maybe it can be accomplished for any class that has a cbind method, which would at least capture matrix and SummarizedExperiment (but not ExpressionSet):

?cbind,SummarizedExperiment-method

lwaldron commented 7 years ago

A (potentially unexpected for users) fall-back for classes without a cbind method then could be to call assay() to convert those ExperimentList elements to matrix, if the class has assay() method (which is not a requirement). I'm not sure I condone this, but thought I would throw it out there.

jonaszierer commented 7 years ago

I don't know if merging can be done in general for all supported classes. But the mergeReplicates function already has special cases for SummarisedExperiment and ExpressionSet. And actually for both the problem is the updating the original object. For SummarisedExperiment the problem can be resolved doing object <- object[, colnames(result)] as well.

For now it might help to just throw a warning if mergeReplicates is used on an unsupported class and returning the original object rather than just failing.

LiNk-NY commented 7 years ago

The issues with the fixes are that colData and phenoData within each object respectively have to be subset for the object to be valid. For SummarizedExperiment, I just took the first row of "colData" of the duplicated replicates. For the ExpressionSet, the phenoData is lost. I could add some mitigation of lost data in the future but it doesn't seem too useful since the MultiAssayExperiment has it's own colData slot.