waldronlab / MultiAssayExperiment

Bioconductor package for management of multi-assay data
https://waldronlab.io/MultiAssayExperiment/
69 stars 32 forks source link

Please document how to add an experiment to an empty MultiAssayExperiment object #274

Closed charles-plessy closed 4 years ago

charles-plessy commented 4 years ago

Hello, here is a question rephrased from that I posted on bioc-devel last week.

In the CAGEr package, our main class, CAGEexp is a wrapper on the MultiAssayExperiment class. In a typical workflow, we first create a MultiAssayExperiment object with no experiment in, and then add the experiments one by one as the analysis proceeds.

As the checks in MultiAssayExperiment get tighter and tighter, I am struggling to find the appropriate way to add experiments. Could you help me by documenting how to an experiment to an empty MultiAssayExperiment object ?

In more details, I have a MultiAssayExperiment object with no experiments in, and a SummarizedExperiment object that I want to add as experiment. I also made sure that rownames(colData(object)) == colnames(value). How can I add this SummarizedExperiment to the MultiAssayExperiemnt? I have tried object[["myExp"]] <- value and object <- c(object, myExp=value) but both of them fail.

[Less importantly, but related: If I want to update an experiment (for instance because I added a new assay in a SummarizedExperiment, on which I applied a normalisation method), what is the most appropriate way ?]

lwaldron commented 4 years ago

Hi @charles-plessy, does this example correctly highlight what you want to do? I think we might have inadvertantly completely taken away the ability to do this.

suppressPackageStartupMessages(library(MultiAssayExperiment))
example("MultiAssayExperiment", echo = FALSE)
mae
#> A MultiAssayExperiment object of 4 listed
#>  experiments with user-defined names and respective classes.
#>  Containing an ExperimentList class object of length 4:
#>  [1] Affy: SummarizedExperiment with 5 rows and 4 columns
#>  [2] Methyl450k: matrix with 5 rows and 5 columns
#>  [3] RNASeqGene: matrix with 5 rows and 4 columns
#>  [4] GISTIC: RangedSummarizedExperiment with 5 rows and 3 columns
#> Features:
#>  experiments() - obtain the ExperimentList instance
#>  colData() - the primary/phenotype DFrame
#>  sampleMap() - the sample availability DFrame
#>  `$`, `[`, `[[` - extract colData columns, subset, or experiment
#>  *Format() - convert into a long or wide DFrame
#>  assays() - convert ExperimentList to a SimpleList of matrices

Now we would like to re-create the mae object starting from an empty MultiAssayExperiment. We can't do it this way because the colData are empty:

mae2 <- MultiAssayExperiment()
c(mae2, experiments(mae)[1], sampleMap = sampleMap(mae))
#> Error in validObject(object): invalid class "MultiAssayExperiment" object: 
#>     All samples in the 'sampleMap' must be in the 'colData'

But there is no way to create or rbind colData in the above example. We also can't start by adding colData or sampleMap because these get harmonized:

colData(mae2) <- colData(mae)
#> harmonizing input:
#>   removing 4 colData rownames not in sampleMap 'primary'
sampleMap(mae2) <- sampleMap(mae)
#> harmonizing input:
#>   removing 16 sampleMap rows not in names(experiments)

I do think we should be able to assemble an arbitrary MultiAssayExperiment by appending to an empty one (MultiAssayExperiment()). Two ways to achieve this this could be:

  1. Don't harmonize the colData in the colData<- replace method (potentially, don't harmonize sampleMap either in the replace method). Or more permissively, only harmonize these during subset methods and allow them to contain colData or maps for non-existent samples otherwise.
  2. Provide a colData argument to c,MultiAssayExperiment-method.

These aren't mutually exclusive. From a user perspective, I think 2 only would be somewhat of a pain if you have to provide a correctly subset colData for every c operation. @LiNk-NY?

As a unit test, we should be able to re-build mae2 above to be the same as mae.

Session Info:

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                                             
#>  version  R Under development (unstable) (2020-04-03 r78147)
#>  os       Debian GNU/Linux 10 (buster)                      
#>  system   x86_64, linux-gnu                                 
#>  ui       X11                                               
#>  language (EN)                                              
#>  collate  en_US.UTF-8                                       
#>  ctype    en_US.UTF-8                                       
#>  tz       Etc/UTC                                           
#>  date     2020-04-19                                        
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package              * version  date       lib source        
#>  assertthat             0.2.1    2019-03-21 [2] CRAN (R 4.1.0)
#>  backports              1.1.5    2019-10-02 [2] CRAN (R 4.1.0)
#>  Biobase              * 2.47.3   2020-03-16 [1] Bioconductor  
#>  BiocGenerics         * 0.33.3   2020-03-23 [1] Bioconductor  
#>  bitops                 1.0-6    2013-08-17 [1] CRAN (R 4.1.0)
#>  callr                  3.4.3    2020-03-28 [2] CRAN (R 4.1.0)
#>  cli                    2.0.2    2020-02-28 [2] CRAN (R 4.1.0)
#>  crayon                 1.3.4    2017-09-16 [2] CRAN (R 4.1.0)
#>  DelayedArray         * 0.13.12  2020-04-10 [1] Bioconductor  
#>  desc                   1.2.0    2018-05-01 [2] CRAN (R 4.1.0)
#>  devtools               2.2.2    2020-02-17 [2] CRAN (R 4.1.0)
#>  digest                 0.6.25   2020-02-23 [2] CRAN (R 4.1.0)
#>  ellipsis               0.3.0    2019-09-20 [2] CRAN (R 4.1.0)
#>  evaluate               0.14     2019-05-28 [2] CRAN (R 4.1.0)
#>  fansi                  0.4.1    2020-01-08 [2] CRAN (R 4.1.0)
#>  fs                     1.4.0    2020-03-31 [2] CRAN (R 4.1.0)
#>  GenomeInfoDb         * 1.23.17  2020-04-13 [1] Bioconductor  
#>  GenomeInfoDbData       1.2.3    2020-04-19 [1] Bioconductor  
#>  GenomicRanges        * 1.39.3   2020-03-24 [1] Bioconductor  
#>  glue                   1.4.0    2020-04-03 [2] CRAN (R 4.1.0)
#>  highr                  0.8      2019-03-20 [2] CRAN (R 4.1.0)
#>  htmltools              0.4.0    2019-10-04 [2] CRAN (R 4.1.0)
#>  IRanges              * 2.21.8   2020-03-25 [1] Bioconductor  
#>  knitr                  1.28     2020-02-06 [2] CRAN (R 4.1.0)
#>  lattice                0.20-41  2020-04-02 [3] CRAN (R 4.1.0)
#>  magrittr               1.5      2014-11-22 [2] CRAN (R 4.1.0)
#>  Matrix                 1.2-18   2019-11-27 [3] CRAN (R 4.1.0)
#>  matrixStats          * 0.56.0   2020-03-13 [1] CRAN (R 4.1.0)
#>  memoise                1.1.0    2017-04-21 [2] CRAN (R 4.1.0)
#>  MultiAssayExperiment * 1.13.21  2020-04-13 [1] Bioconductor  
#>  pkgbuild               1.0.6    2019-10-09 [2] CRAN (R 4.1.0)
#>  pkgload                1.0.2    2018-10-29 [2] CRAN (R 4.1.0)
#>  prettyunits            1.1.1    2020-01-24 [2] CRAN (R 4.1.0)
#>  processx               3.4.2    2020-02-09 [2] CRAN (R 4.1.0)
#>  ps                     1.3.2    2020-02-13 [2] CRAN (R 4.1.0)
#>  R6                     2.4.1    2019-11-12 [2] CRAN (R 4.1.0)
#>  Rcpp                   1.0.4    2020-03-17 [2] CRAN (R 4.1.0)
#>  RCurl                  1.98-1.2 2020-04-18 [1] CRAN (R 4.1.0)
#>  remotes                2.1.1    2020-02-15 [2] CRAN (R 4.1.0)
#>  rlang                  0.4.5    2020-03-01 [2] CRAN (R 4.1.0)
#>  rmarkdown              2.1      2020-01-20 [1] CRAN (R 4.1.0)
#>  rprojroot              1.3-2    2018-01-03 [2] CRAN (R 4.1.0)
#>  S4Vectors            * 0.25.15  2020-04-04 [1] Bioconductor  
#>  sessioninfo            1.1.1    2018-11-05 [2] CRAN (R 4.1.0)
#>  stringi                1.4.6    2020-02-17 [2] CRAN (R 4.1.0)
#>  stringr                1.4.0    2019-02-10 [2] CRAN (R 4.1.0)
#>  SummarizedExperiment * 1.17.5   2020-03-27 [1] Bioconductor  
#>  testthat               2.3.2    2020-03-02 [2] CRAN (R 4.1.0)
#>  usethis                1.5.1    2019-07-04 [2] CRAN (R 4.1.0)
#>  withr                  2.1.2    2018-03-15 [2] CRAN (R 4.1.0)
#>  xfun                   0.12     2020-01-13 [2] CRAN (R 4.1.0)
#>  XVector                0.27.2   2020-03-24 [1] Bioconductor  
#>  yaml                   2.2.1    2020-02-01 [2] CRAN (R 4.1.0)
#>  zlibbioc               1.33.1   2020-01-24 [1] Bioconductor  
#> 
#> [1] /usr/local/lib/R/host-site-library
#> [2] /usr/local/lib/R/site-library
#> [3] /usr/local/lib/R/library

Created on 2020-04-19 by the reprex package (v0.3.0)

LiNk-NY commented 4 years ago

I'm not sure why there is a need to add to an empty MultiAssayExperiment when the constructor function is pretty robust. Perhaps the easier route is to rbind / c the all the component pieces. If you're normalizing, it's fairly straightforward to use the c operation for an assay that already exists in the object.

suppressPackageStartupMessages(library(MultiAssayExperiment))
example(MultiAssayExperiment)
#> [truncated...]
mae2 <- MultiAssayExperiment()
mae0 <- MultiAssayExperiment(
    c(experiments(mae2), experiments(mae)[1]), colData = colData(mae),
    sampleMap(mae)
)
#> harmonizing input:
#>   removing 12 sampleMap rows not in names(experiments)

Created on 2020-04-19 by the reprex package (v0.3.0)

lwaldron commented 4 years ago

Using the constructor function in this case seems like a workaround required because it’s not possible to concatenate to colData, making it impossible to add assay data for samples not already in colData. Using the constructor would require developers of derived classes to write their own concatenation functions create new objects, also specifying metadata and any other slots specific to the derived class, instead of being able to use a generic concatenation function. Perhaps @charles-plessy can say why they build on an empty object but it seems like a valid approach, and the MultiAssayExperiment API supports both the creation of empty objects and concatenation - but with the catch that there’s no way to add new colData rows.

Actually there is another way that @charles-plessy could probably use right away, concatenating additional MultiAssayExperiment objects (or derived same-class objects), something like this (this exact example will keep duplicating metadata but you get the point):

library(MultiAssayExperiment)
example("MultiAssayExperiment")
mae0 <- MultiAssayExperiment()
c(c(mae0, mae[, , 1]), mae[, , 2])
charles-plessy commented 4 years ago

Thanks for your answers; with them I think I could fix my package.

When I create new CAGEexp (wrapping MultiAssayExperiment) objects, they only contain metadata and colData that point to files from which experimental results can be read. Then, a function builds a SummarizedExperiment object (called tagCountMatrix above) after parsing and processing the files. This is why I would like to be able to add a new experiment to a MultiAssayExperiment that does not contain any.

Other functions of the CAGEr package transform the tagCountMatrix experiment, so I would like to be able to update it. Lastly, the CAGEexp objects may contain other experiments, that may be added at later stages of the analysis. This is why I need to be able to add new experiments to an object that contains at least one.

So ideally I would like to be able to do blindly something like the following, just like for a list:

(the exmaple is not entirely syntactically correct, but I think you see the point)

"tagCountMatrixExp<-" <- function (object, value) {
  object[["tagCountMatrix"]] <- value
  object
}

At the moment I have the following working solution:

"tagCountMatrixExp<-" <- function (object, value) {

  # If the object is empty, create a new one with a tagCountMatrix experiment
  if (length(experiments(object)) == 0) {
    object <- MultiAssayExperiment( experiments = ExperimentList(tagCountMatrix=value)
                                  , colData = colData(object)
                                  , metadata = metadata(object))
    class(object) <- structure("CAGEexp", package = "CAGEr")

  # If the object is not empty but does not have a tagCountMatrix experiment, add one
  } else if (is.null(object[["tagCountMatrix"]])) {
    object <- c(object, tagCountMatrix=value)

  # If the object is not empty and has a tagCountMatrix experiment, update it.
  } else {
    object[["tagCountMatrix"]] <- value
  }
  object
}
stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed if no further activity occurs. Thank you for your contributions.