waldronlab / MultiAssayExperiment

Bioconductor package for management of multi-assay data
https://waldronlab.io/MultiAssayExperiment/
69 stars 32 forks source link

Bug when simultaneously subsetting samples and assays #237

Closed jonaszierer closed 6 years ago

jonaszierer commented 6 years ago

Hi, I think there is a bug when subsetting samples and assays simultaneously:

Construct a MAE like this:

 ## ASSAY 1
  (arraydat <- matrix(sample(1:10, 5*5, replace = T), ncol=5,
                      dimnames=list(paste0("gene", 1:5),
                                    paste0("sample", 2:6))))

  ## ASSAY 2
  (lofdat <- matrix(sample(c("LOF", "WT"), 5*5, replace = T), ncol=5,
                    dimnames=list(paste0("gene", 2:6),
                                  paste0("sample", 1:5))))

  ## COLDATA
  (cd <- data.frame(pheno = paste0("pheno", c(2, 1,1,1,1,2)),
                    row.names =   paste0("sample", 1:6)))
  mae <- MultiAssayExperiment(list(exp = arraydat, lof = lofdat), colData = cd)
  mae

And now subset all samples and assay

mae2 <- mae[, mae$pheno == "pheno1", c("exp")]
mae2$pheno

now all mae2$pheno should be pheno1, but they are not. They are

[1] pheno1 pheno1 pheno1 pheno2
Levels: pheno1 pheno2

I suppose that assays are subset first and some samples are removed while doing so. Then the logical vector for the samples is applied to the already reduced object...?

It works fine when subsetting first the assays and then the samples (or the other way round...)

LiNk-NY commented 6 years ago

Hi Jonas, @jonaszierer Thanks for opening this issue.

The simple and fast fix would be to subset by columns or colData first and then subset the assays. It seems like this change won't have much implications on the other subsetting methods.

The complicated fix would be to use non-standard evaluation but the first option does the trick.

Best regards, Marcel

colData(mae[, mae$pheno == "pheno1", "exp"])
harmonizing input:
  removing 2 sampleMap rows with 'colname' not in colnames of experiments
  removing 2 colData rownames not in sampleMap 'primary'
DataFrame with 4 rows and 1 column
           pheno
        <factor>
sample2   pheno1
sample3   pheno1
sample4   pheno1
sample5   pheno1
jonaszierer commented 6 years ago

Hi @LiNk-NY , great thanks for the quick fix! Jonas