waldronlab / MultiAssayExperiment

Bioconductor package for management of multi-assay data
https://waldronlab.io/MultiAssayExperiment/
70 stars 32 forks source link

Implement "deep" update of MultiAssayExperiment objects #305

Closed hpages closed 2 years ago

hpages commented 2 years ago

@LiNk-NY

Hi Marcel,

Here is an example of a MultiAssayExperiment object that contains many old DataFrame instances, some of them hiding deep inside the object:

library(MultiAssayExperiment)
library(ELMER)
mae <- ELMER:::getdata("elmer.data.example")

class(mae)
# [1] "MultiAssayExperiment"
# attr(,"package")
# [1] "MultiAssayExperiment"

class(mae@colData)
# [1] "DataFrame"
# attr(,"package")
# [1] "S4Vectors"

class(mae@sampleMap)
# [1] "DataFrame"
# attr(,"package")
# [1] "S4Vectors"

class(mae@ExperimentList@listData[[1]]@colData)
# [1] "DataFrame"
# attr(,"package")
# [1] "S4Vectors"

class(mae@ExperimentList@listData[[1]]@rowRanges@elementMetadata)
# [1] "DataFrame"
# attr(,"package")
# [1] "S4Vectors"

etc... (there are more!)

With MultiAssayExperiment 1.21.3, these old DataFrame instances don't get updated:

mae2 <- updateObject(mae, check=FALSE, verbose=TRUE)  # not much seems to happen 
# updateObject(object = 'MultiAssayExperiment')

The object has not changed (don't use identical() to compare the 2 objects, it's not reliable):

library(digest)
digest(mae) == digest(mae2)
# [1] TRUE

Furthermore, if we don't use check=FALSE, then updateObject() tries to validate the object after updating it, but fails in an ugly way:

mae2 <- updateObject(mae, verbose=TRUE)
# updateObject(object = 'MultiAssayExperiment')
# [updateObject] Validating the updated object ... Error in h(simpleError(msg, call)) : 
#   error in evaluating the argument 'table' in selecting a method for function '%in%': error in
#   evaluating the argument 'x' in selecting a method for function 'colnames': unable to find
#   an inherited method for function 'extractCOLS' for signature '"DataFrame"'

MultiAssayExperiment 1.21.4 addresses this:

library(MultiAssayExperiment)
library(ELMER)
mae <- ELMER:::getdata("elmer.data.example")
mae2 <- updateObject(mae, verbose=TRUE)
# updateObject(object = 'MultiAssayExperiment')
# updateObject(object="ANY") default for object of class 'matrix'
# [updateObject] Validating the updated object ... OK
# [updateObject] Validating the updated object ... OK
# [updateObject] Setting class attribute of DataFrame instance to "DFrame" ... OK
# updateObject(object="ANY") default for object of class 'NULL'
# [updateObject] Validating the updated object ... OK
# [updateObject] Validating the updated object ... OK
# [updateObject] Setting class attribute of DataFrame instance to "DFrame" ... OK
# updateObject(object="ANY") default for object of class 'NULL'
# [updateObject] Validating the updated object ... OK
# [updateObject] Validating the updated object ... OK
# [updateObject] GRanges object uses internal representation from
# [updateObject] GenomicRanges < 1.31.16. Updating it ... OK
# [updateObject] elementType slot of IRanges object should be set to "ANY",
# [updateObject] not to "integer".
# [updateObject] Updating it ... OK
# updateObject(object="ANY") default for object of class 'NULL'
# [updateObject] Validating the updated object ... OK
# [updateObject] Validating the updated object ... OK
# [updateObject] Setting class attribute of DataFrame instance to "DFrame" ... OK
# updateObject(object="ANY") default for object of class 'NULL'
# [updateObject] Validating the updated object ... OK
# [updateObject] Validating the updated object ... OK
# [updateObject] Validating the updated object ... OK
# [updateObject] Validating the updated object ... OK
# updateObject(object="ANY") default for object of class 'matrix'
# [updateObject] Validating the updated object ... OK
# [updateObject] Validating the updated object ... OK
# [updateObject] Setting class attribute of DataFrame instance to "DFrame" ... OK
# updateObject(object="ANY") default for object of class 'NULL'
# [updateObject] Validating the updated object ... OK
# [updateObject] Validating the updated object ... OK
# [updateObject] Setting class attribute of DataFrame instance to "DFrame" ... OK
# updateObject(object="ANY") default for object of class 'NULL'
# [updateObject] Validating the updated object ... OK
# [updateObject] Validating the updated object ... OK
# [updateObject] GRanges object uses internal representation from
# [updateObject] GenomicRanges < 1.31.16. Updating it ... OK
# [updateObject] elementType slot of IRanges object should be set to "ANY",
# [updateObject] not to "integer".
# [updateObject] Updating it ... OK
# updateObject(object="ANY") default for object of class 'NULL'
# [updateObject] Validating the updated object ... OK
# [updateObject] Validating the updated object ... OK
# [updateObject] Setting class attribute of DataFrame instance to "DFrame" ... OK
# updateObject(object="ANY") default for object of class 'NULL'
# [updateObject] Validating the updated object ... OK
# [updateObject] Validating the updated object ... OK
# [updateObject] Validating the updated object ... OK
# [updateObject] Validating the updated object ... OK
# updateObject(object="ANY") default for object of class 'NULL'
# [updateObject] Validating the updated object ... OK
# [updateObject] Validating the updated object ... OK
# [updateObject] Setting class attribute of DataFrame instance to "DFrame" ... OK
# updateObject(object="ANY") default for object of class 'NULL'
# [updateObject] Validating the updated object ... OK
# [updateObject] Validating the updated object ... OK
# [updateObject] Setting class attribute of DataFrame instance to "DFrame" ... OK
# updateObject(object="ANY") default for object of class 'NULL'
# [updateObject] Validating the updated object ... OK
# [updateObject] Validating the updated object ... OK
# [updateObject] Validating the updated object ... OK

Looks like a lot of things happened! Some manual inspection suggests that the DataFrame instances got replaced with DFrame instances:

class(mae2@colData)
# [1] "DFrame"
# attr(,"package")
# [1] "S4Vectors"

class(mae2@sampleMap)
# [1] "DFrame"
# attr(,"package")
# [1] "S4Vectors"

class(mae2@ExperimentList@listData[[1]]@colData)
# [1] "DFrame"
# attr(,"package")
# [1] "S4Vectors"

class(mae2@ExperimentList@listData[[1]]@rowRanges@elementMetadata)
# [1] "DFrame"
# attr(,"package")
# [1] "S4Vectors"

And validObject() is happy:

validObject(mae2)  # shallow validation
# [1] TRUE
validObject(mae2, complete=TRUE)  # deep validation
# [1] TRUE

This works because:

  1. The updateObject() method for MultiAssayExperiment objects now calls updateObject() on the individual slots of the object. Note that it only does it on slots that are themselves S4 objects which is probably good enough for now.
  2. Calling updateObject() on these slots calls updateObject() methods that also perform a deep update. So the entire object is recursively inspected and updated all the way to the deepest levels.

Cheers, H.