waldronlab / MultiAssayExperiment

Bioconductor package for management of multi-assay data
https://waldronlab.io/MultiAssayExperiment/
70 stars 32 forks source link

Single bracket subset with drop=FALSE returns invalid object #251

Closed p-smirnov closed 6 years ago

p-smirnov commented 6 years ago

Not sure if this is intended behaviour, but I noticed that if you do a single bracket subset with drop=FALSE, which keeps the empty experiments, you run into an issue that the object returned is not considered a valid MultiAssayExperiment. The issue is NA entries in the Primary column of sampleMap. Here is an illustrative example:

require(MultiAssayExperiment)

E1 <- matrix(NA_real_, ncol=3, nrow=3)
E2 <- matrix(NA_real_, ncol=3, nrow=3)
E3 <- matrix(NA_real_, ncol=2, nrow=3)

colnames(E1) <- colnames(E2) <- c('A', 'B', 'C')
colnames(E3) <- c('A', 'B')

testMAE <- MultiAssayExperiment(list(E1 = E1, E2 = E2, E3 = E3))

validObject(testMAE[,3, drop=F]) #Errors

sampleMap(testMAE[,3, drop=F])
p-smirnov commented 6 years ago

In some sense this is a dual to my previous issue. Here there are assays with no entries in the colData (they cannot have any since they have no samples!) while I want to be able to have colData entries with no samples in any assay.

lwaldron commented 6 years ago

In an earlier iteration of MultiAssayExperiment, there was no validity requirement for all rownames of colData to be found in sampleMap. I saw this as a way to create an initial MultiAssayExperiment with the entire sampleMap, even if not all experiments were not in place but would be concatenated in subsequent steps. Conceptually I don't see a problem with removing it (unlike the reverse requirement that each column of ExperimentList must map to exactly one row of sampleMap), @LiNk-NY can you comment?

https://github.com/waldronlab/MultiAssayExperiment/blob/ce42a3e1508b32b8ddd783d8f0ad788d364e28e7/R/MultiAssayExperiment-class.R#L310

lwaldron commented 6 years ago

Apologies, I was thinking about a different situation than what you've uncovered. In this example the sampleMap should have been subsetted, like follows:

> require(MultiAssayExperiment)
> E1 <- matrix(NA_real_, ncol=3, nrow=3)
> E2 <- matrix(NA_real_, ncol=3, nrow=3)
> E3 <- matrix(NA_real_, ncol=2, nrow=3)
> colnames(E1) <- colnames(E2) <- c('A', 'B', 'C')
> colnames(E3) <- c('A', 'B')
> dummy <- matrix(NA_real_, )
> testMAE <- MultiAssayExperiment(list(E1 = E1, E2 = E2, E3 = E3))
> invalid <- testMAE[,3, drop=FALSE]
harmonizing input:
  removing 6 sampleMap rows with 'colname' not in colnames of experiments
  removing 2 colData rownames not in sampleMap 'primary'
> sampleMap(invalid)
DataFrame with 3 rows and 3 columns
     assay     primary     colname
  <factor> <character> <character>
1       E1           C           C
2       E2           C           C
3       E3          NA          NA
> valid <- MultiAssayExperiment(experiments=experiments(invalid), 
+                      colData = colData(invalid),
+                      sampleMap = sampleMap(invalid))
harmonizing input:
  removing 1 sampleMap rows with 'colname' not in colnames of experiments
> sampleMap(valid)
DataFrame with 2 rows and 3 columns
     assay     primary     colname
  <factor> <character> <character>
1       E1           C           C
2       E2           C           C
> identical(experiments(valid), experiments(invalid))
[1] TRUE
> identical(colData(valid), colData(invalid))
[1] TRUE
>