waldronlab / MultiAssayExperiment

Bioconductor package for management of multi-assay data
https://waldronlab.io/MultiAssayExperiment/
70 stars 32 forks source link

Linear subsetting of an ExperimentList object is broken in MultiAssayExperiment 1.15.7 #283

Closed hpages closed 4 years ago

hpages commented 4 years ago

Hi Marcel,

The specific commit that breaks it is commit 01363dce4f218337a9e383cf9e9ad56f92da2845.

With a serialized ExperimentList object from the AffiXcan package:

library(AffiXcan)
tbaPath <- system.file("extdata", "training.tba.toydata.rds", package="AffiXcan")
tbaMatrixMAE <- readRDS(tbaPath)
tbaMatrix <- MultiAssayExperiment::experiments(tbaMatrixMAE)
length(tbaMatrix)
# [1] 4

Extracting the first two elements should return an object of length 2 but it returns an object of length 4:

length(tbaMatrix[1:2])
# [1] 4

Any other attempt at using 1D-style (a.k.a. linear) subsetting returns an object of the same length as the original object:

length(tbaMatrix[-1])
# [1] 4
length(tbaMatrix[4:2])
# [1] 4
length(tbaMatrix[c(TRUE, FALSE)])
# [1] 4

Except when I don't select any element (empty selection):

tbaMatrix[FALSE]
# ExperimentList class object of length 0:

tbaMatrix[0]
# ExperimentList class object of length 0: 

tbaMatrix[integer(0)]
# ExperimentList class object of length 0:
# Warning message:
# In max(y) : no non-missing arguments to max; returning -Inf

Note that this last warning is spurious and confusing since subsetting by integer(0) is semantically equivalent to subsetting by 0.

The new behavior of [ on ExperimentList objects breaks utils::relist():

utils:::relist(tbaMatrix, list(1, 2:3, 4))
# [[1]]
# ExperimentList class object of length 4:
#  
# [[2]]
# ExperimentList class object of length 4:
#  
# [[3]]
# ExperimentList class object of length 4:

and consequently BiocParallalel:::.splitX() and BiocParallalel::bplapply() where utils::relist() is used internally to break the object into the pieces that get passed to the workers:

library(BiocParallel)
BiocParallel:::.splitX(tbaMatrix, 3, 0)
# [[1]]
# ExperimentList class object of length 4:
#  
# [[2]]
# ExperimentList class object of length 4:
#  
# [[3]]
# ExperimentList class object of length 4:

length(bplapply(tbaMatrix, identity, BPPARAM=MulticoreParam(workers=3)))
# [1] 12

This ultimately breaks the affiXcanTrain() function in the AffiXcan package and the vignette where it's used: https://bioconductor.org/checkResults/3.12/bioc-LATEST/AffiXcan/

... plus possibly other packages (don't know, didn't go thru today's full report yet, only stumbled on this while going thru the letter A)

sessionInfo()

R version 4.0.2 Patched (2020-06-24 r78746)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.7 LTS

Matrix products: default
BLAS:   /home/hpages/R/R-4.0.r78746/lib/libRblas.so
LAPACK: /home/hpages/R/R-4.0.r78746/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] BiocParallel_1.23.2         AffiXcan_1.7.0             
 [3] SummarizedExperiment_1.19.9 Biobase_2.49.1             
 [5] GenomicRanges_1.41.6        GenomeInfoDb_1.25.11       
 [7] IRanges_2.23.10             S4Vectors_0.27.14          
 [9] BiocGenerics_0.35.4         MatrixGenerics_1.1.5       
[11] matrixStats_0.57.0         

loaded via a namespace (and not attached):
 [1] lattice_0.20-41             crayon_1.3.4               
 [3] bitops_1.0-6                grid_4.0.2                 
 [5] zlibbioc_1.35.0             XVector_0.29.3             
 [7] MultiAssayExperiment_1.15.7 Matrix_1.2-18              
 [9] tools_4.0.2                 RCurl_1.98-1.2             
[11] DelayedArray_0.15.16        compiler_4.0.2             
[13] GenomeInfoDbData_1.2.4     
LiNk-NY commented 4 years ago

Hi Hervé, @hpages Thanks for catching this. I did a revdepscheck on the package changes and this was expected. I've notified the maintainer https://github.com/alussana/AffiXcan/issues/1 that we are switching to a 'matrix' type of subsetting for the ExperimentList class in response to comments in this issue https://github.com/waldronlab/MultiAssayExperiment/issues/276

hpages commented 4 years ago

Marcel,

ExperimentList is a List derivative and should behave like a list-like object. This means that linear subsetting (x[i]) should subset along the length of the object and satisfy reasonable expectations like "x[4:3] must have length 2" and "x[[i]] must be the same as x[i][[1]] for any 1 <= i <= length(x)". Otherwise basic functionalities like bplapply() (and many others) break on ExperimentList objects.

The 'matrix'-type subsetting as currently implemented for ExperimentList violates these expectations. Note that an x[i, j, k] subsetting where i controls the subsetting along the length and j, k performs the matrix-like subsetting of the individual list elements would have been compatible with the Vectors/List contract. But that's not what you did.

I see that the suggested fix for the AffiXcan package is that they coerce their ExperimentList object to SimpleList to make it behave like a List derivative again. But why would they need to do that? I mean, ExperimentList objects are already List derivatives (is(x, "List") is TRUE), but because they don't behave like List derivatives people must coerce them to SimpleList to make them behave like List derivatives again... !?

More generally speaking, there is a contract associated with each class in the Vector hierarchy, and extending a class in that hierarchy implicitly means signing the contract and sticking to it. Overriding some methods with more specialized methods is OK as long as the more specialized methods respect the original contract.

If ExperimentList is no longer going to have a list-like semantic, then it should no longer extend List. Unfortunately this also means that it should probably be renamed to drop the "List" sufix (e.g. "Experiments") to avoid misleading the user.

Hope this makes sense.

H.

lwaldron commented 4 years ago

I had thought of ExperimentList as MultiAssayExperiment-like, but @hpages's rationale makes sense. We should discuss what to do - if we revert ExperimentList's matrix-like subsetting I would want to keep those actions available as verbs.

LiNk-NY commented 4 years ago

Thanks Hervé, @hpages This makes sense to me. I've reverted the changes and kept the subsetBy* functions. I can also direct users to use the MultiAssayExperiment interface for these type of operations. cc: @lwaldron -Marcel