Closed LTLA closed 4 years ago
FYI, this is what I did. Plug and play into getWithColData
:
# Avoid dealing with int/char differences.
mae0 <- mae[,,experiment]
sm <- sampleMap(mae0)
mae.cn <- sm$primary[match(colnames(se), sm$colname)]
expanded <- colData(mae0)[mae.cn,,drop=FALSE]
# Prune out duplicate entries.
existing <- colData(se)
common <- intersect(colnames(existing), colnames(expanded))
for (j in common) {
if (!identical(existing[[j]], expanded[[j]])) {
warning(sprintf("non-identical '%s' field in both SE and MAE-level colData", j))
}
}
leftovers <- expanded[,setdiff(colnames(expanded), common),drop=FALSE]
colData(se) <- cbind(existing, leftovers)
Hi Aaron, @LTLA
It does use sampleMap
information during subsetting.
I'm not sure what the rationale is for duplicating the colData
information in the expanded
DataFrame.
sm <- sampleMap(mae0)
- mae.cn <- sm$primary[match(colnames(se), sm$colname)]
- expanded <- colData(mae0)[mae.cn,,drop=FALSE]
+ expanded <- colData(mae0)[sm$primary, , drop = FALSE]
> colData(mae0)[sm$primary, , drop = FALSE]
DataFrame with 10 rows and 2 columns
Sex Age
<character> <integer>
A M 42
A M 42
B F 16
B F 16
B F 16
C M 12
C M 12
D F 23
E M 28
E M 28
instead of
> colData(mae[, , 1L])
harmonizing input:
removing 5 sampleMap rows not in names(experiments)
removing 1 colData rownames not in sampleMap 'primary'
DataFrame with 5 rows and 2 columns
Sex Age
<character> <integer>
A M 42
B F 16
C M 12
D F 23
E M 28
Perhaps because the input contains technical replicates?
> replicated(mae)
$rnaseq
LogicalList of length 5
[["A"]] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[["B"]] FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE
[["C"]] FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE
[["D"]] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[["E"]] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
$chipseq
LogicalList of length 5
[["B"]] FALSE FALSE FALSE FALSE FALSE
[["C"]] FALSE FALSE FALSE FALSE FALSE
[["D"]] FALSE FALSE FALSE FALSE FALSE
[["E"]] FALSE FALSE FALSE FALSE FALSE
[["F"]] FALSE FALSE FALSE FALSE FALSE
The function would work if those were merged or separated into different assays
before extracting the colData
. I suppose we could add a function like separateReplicates
. I can also add a check to the function when the input contains replicates.
Best, Marcel cc: @lwaldron
I'm not arguing about the subsetting, that works fine. I'm talking about the behavior of getWithColData()
and the propagation of MAE-level colData
down to the individual SE's. As the MWE in my original post indicates, that function fails in the case of a non-1:1: mapping.
Now I just want to be able to extract SEs while propagating
colData
from the MAEs. But:Looking inside
getWithColData
, it seems that the function doesn't even use thesampleMap
information, which kind of defeats the purpose of using the MAE to store complex things.I'd also like the option to append the MAE-level
colData
to the SE's existingcolData
rather than overwriting it. In fact, I daresay that ought to be the default setting, why discard information?Session information
``` R Under development (unstable) (2020-03-11 r77927) Platform: x86_64-apple-darwin17.7.0 (64-bit) Running under: macOS High Sierra 10.13.6 Matrix products: default BLAS: /Users/luna/Software/R/trunk/lib/libRblas.dylib LAPACK: /Users/luna/Software/R/trunk/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] parallel stats4 stats graphics grDevices utils datasets [8] methods base other attached packages: [1] MultiAssayExperiment_1.13.21 SummarizedExperiment_1.17.5 [3] DelayedArray_0.13.12 matrixStats_0.56.0 [5] Biobase_2.47.3 GenomicRanges_1.39.3 [7] GenomeInfoDb_1.23.17 IRanges_2.21.8 [9] S4Vectors_0.25.15 BiocGenerics_0.33.3 loaded via a namespace (and not attached): [1] lattice_0.20-41 bitops_1.0-6 grid_4.0.0 [4] zlibbioc_1.33.1 XVector_0.27.2 Matrix_1.2-18 [7] tools_4.0.0 RCurl_1.98-1.1 compiler_4.0.0 [10] GenomeInfoDbData_1.2.3 ```