hansenlab / bsseq

Devel repository for bsseq
36 stars 26 forks source link

Error in combining multiple bsseq objects #94

Open katiearacena opened 4 years ago

katiearacena commented 4 years ago

Hello, I am having trouble combining multiple bsseq objects (which I have for each chromosome). Here "chr20", "chr21" and "chr22" are all bsseq objects. Below is my code along with the error I get. I assume that 249 probably comes from 83*3. `

bsList <- list(chr20, chr21, chr22) bsCombined <- combineList(bsList) Error in validObject(.Object) : invalid class "SummarizedExperiment" object: nb of cols in 'assay' (249) must equal nb of rows in 'colData' (83)`

Any insight into how to work through this error and/or combine bsseq objects would be greatly appreciated! Thanks!

PeteHaitch commented 4 years ago

Please include the output of BiocManager::valid(). Are you able to share these objects?

katiearacena commented 4 years ago

Thanks for the quick reply. Here is the output of BiocManager::valid():

> BiocManager::valid()
> 
> * sessionInfo()
> 
> R version 3.6.3 (2020-02-29)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Scientific Linux 7.4 (Nitrogen)
> 
> Matrix products: default
> BLAS:   /project2/lbarreiro/software/R-3.6.3/lib64/R/lib/libRblas.so
> LAPACK: /project2/lbarreiro/software/R-3.6.3/lib64/R/lib/libRlapack.so
> 
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
> 
> attached base packages:
> [1] stats4    parallel  stats     graphics  grDevices utils     datasets
> [8] methods   base
> 
> other attached packages:
>  [1] bsseq_1.22.0                SummarizedExperiment_1.16.1
>  [3] DelayedArray_0.12.3         BiocParallel_1.20.1
>  [5] matrixStats_0.56.0          Biobase_2.46.0
>  [7] GenomicRanges_1.38.0        GenomeInfoDb_1.22.1
>  [9] IRanges_2.20.2              S4Vectors_0.24.4
> [11] BiocGenerics_0.32.0
> 
> loaded via a namespace (and not attached):
>  [1] Rcpp_1.0.4.6             BiocManager_1.30.10      compiler_3.6.3
>  [4] XVector_0.26.0           R.methodsS3_1.8.0        R.utils_2.9.2
>  [7] bitops_1.0-6             tools_3.6.3              DelayedMatrixStats_1.8.0
> [10] zlibbioc_1.32.0          lifecycle_0.2.0          rhdf5_2.30.1
> [13] lattice_0.20-38          BSgenome_1.54.0          rlang_0.4.6
> [16] Matrix_1.2-18            GenomeInfoDbData_1.2.2   rtracklayer_1.46.0
> [19] Biostrings_2.54.0        gtools_3.8.1             locfit_1.5-9.4
> [22] grid_3.6.3               data.table_1.12.8        R6_2.4.1
> [25] HDF5Array_1.14.4         XML_3.99-0.3             limma_3.42.2
> [28] Rhdf5lib_1.8.0           GenomicAlignments_1.22.1 scales_1.1.1
> [31] Rsamtools_2.2.3          permute_0.9-5            colorspace_1.4-1
> [34] RCurl_1.98-1.2           munsell_0.5.0            R.oo_1.23.0
> 
> Bioconductor version '3.10'
> 
>   * 58 packages out-of-date
>   * 0 packages too new
> 
> create a valid installation with
> 
>   BiocManager::install(c(
>     "ape", "backports", "bit64", "dplyr", "ellipsis", "fitdistrplus", "fs",
>     "future", "future.apply", "ggplot2", "glue", "gplots", "gtools",
>     "htmltools", "httr", "isoband", "jsonlite", "later", "metap", "multcomp",
>     "mvtnorm", "openssl", "patchwork", "pillar", "pkgbuild", "pkgload",
>     "plotly", "plotrix", "processx", "promises", "ps", "purrr", "quantreg",
>     "Rcpp", "Rdpack", "remotes", "reshape2", "reticulate", "rlang", "ROCR",
>     "scales", "Seurat", "sn", "sys", "tibble", "tidyr", "tidyselect", "vctrs",
>     "withr", "zoo"
>   ), update = TRUE, ask = FALSE)
> 
> more details: BiocManager::valid()$too_new, BiocManager::valid()$out_of_date
> 
> Warning message:
> 58 packages out-of-date; 0 packages too new
> 

I'm not sure how to share the objects with you but I was able to re-create the problem using the following lines of code from this resource: http://rstudio-pubs-static.s3.amazonaws.com/281819_b9c1b73b45244e43b8ffc014bebffbdc.html

> library(bsseq)
numCpGs <- 1e5
numSamples <- 10
gr <- GRanges("chr1", IRanges(1:numCpGs, width=1))
cov <- matrix(rbinom(numCpGs*numSamples, 2, 0.1), ncol=numSamples)
m <- matrix(rbinom(numCpGs*numSamples, size = cov, prob=0.5), ncol=numSamples)
bs <- BSseq(gr=gr, M=m, Cov=cov)
bs1 <- bs
bs2 <- bs
bsList <- list(bs1, bs2)
bsCombined <- combineList(bsList)
Error in validObject(.Object) :
  invalid class “SummarizedExperiment” object:
    nb of cols in 'assay' (20) must equal nb of rows in 'colData' (10)

Out of curiosity, are you able to use rbind() to combine BSseq objects? I used rbind() and it looks like everything was successfully combined. I'm wondering if there are some nuances I'm missing.

katiearacena commented 4 years ago

After investigating further I am wondering if the issue could stem from my objects having the same sample IDs (they are the same samples just different chromosomes). When I change the sample names to include chr #, making them unique, the function combineList() works fine. Now I am thinking I could be using the wrong tool to combine my BSseq objects since they all have the same sample IDs but different loci.

PeteHaitch commented 4 years ago

If they are the same samples (in the same order) across the BSseq objects then just use rbind(). combineList() tries to be smart about matching up samples (e.g., when one object contains samples 1-10 and the other object only contains samples 1-5) and is meant to simply defer to rbind() if the samples are the same, but it looks like something is getting tripped up. I won't have time to investigate further for a couple of weeks, so if rbind() is doing what you want then you can safely stick with that.

katiearacena commented 4 years ago

ok, great I will use rbind for now. Thanks for your help!