satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.27k stars 910 forks source link

JoinLayers needs all layers to have the same number of genes #9086

Closed sylvia-science closed 2 months ago

sylvia-science commented 3 months ago

I'm trying to merge data from the Weizman Lung Atlas and I'm having trouble with the JoinLayers function. I think it's because it expects all rows in the counts layer to be the same, but with public data this isn't possible.

This is just the simplest code I have where I merge a list of Seurat V5 objects and then try to join.

data = merge(
  x = data_list[[1]],
  y =data_list[2:length(data_list)],
)
data = JoinLayers(data)

Error in `.rowNamesDF<-`(x, value = value) : invalid 'row.names' length

sessionInfo() R version 4.4.1 (2024-06-14) Platform: aarch64-apple-darwin20 Running under: macOS Sonoma 14.5

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Brussels tzcode source: internal

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] org.Hs.eg.db_3.19.1 AnnotationDbi_1.66.0 IRanges_2.38.0 S4Vectors_0.42.0 Biobase_2.64.0
[6] BiocGenerics_0.50.0 limma_3.60.3 Matrix_1.7-0 stringr_1.5.1 Seurat_5.1.0
[11] SeuratObject_5.0.2 sp_2.1-4

loaded via a namespace (and not attached): [1] RColorBrewer_1.1-3 rstudioapi_0.16.0 jsonlite_1.8.8 magrittr_2.0.3
[5] spatstat.utils_3.0-5 zlibbioc_1.50.0 vctrs_0.6.5 ROCR_1.0-11
[9] memoise_2.0.1 spatstat.explore_3.2-7 htmltools_0.5.8.1 sctransform_0.4.1
[13] parallelly_1.37.1 KernSmooth_2.23-24 htmlwidgets_1.6.4 ica_1.0-3
[17] plyr_1.8.9 plotly_4.10.4 zoo_1.8-12 cachem_1.1.0
[21] igraph_2.0.3 mime_0.12 lifecycle_1.0.4 pkgconfig_2.0.3
[25] R6_2.5.1 fastmap_1.2.0 GenomeInfoDbData_1.2.12 fitdistrplus_1.1-11
[29] future_1.33.2 shiny_1.8.1.1 digest_0.6.36 colorspace_2.1-0
[33] patchwork_1.2.0 tensor_1.5 RSpectra_0.16-1 irlba_2.3.5.1
[37] RSQLite_2.3.7 progressr_0.14.0 fansi_1.0.6 spatstat.sparse_3.1-0
[41] httr_1.4.7 polyclip_1.10-6 abind_1.4-5 compiler_4.4.1
[45] bit64_4.0.5 DBI_1.2.3 fastDummies_1.7.3 MASS_7.3-61
[49] tools_4.4.1 lmtest_0.9-40 httpuv_1.6.15 future.apply_1.11.2
[53] goftest_1.2-3 glue_1.7.0 nlme_3.1-165 promises_1.3.0
[57] grid_4.4.1 Rtsne_0.17 cluster_2.1.6 reshape2_1.4.4
[61] generics_0.1.3 gtable_0.3.5 spatstat.data_3.1-2 tidyr_1.3.1
[65] data.table_1.15.4 XVector_0.44.0 utf8_1.2.4 spatstat.geom_3.2-9
[69] RcppAnnoy_0.0.22 ggrepel_0.9.5 RANN_2.6.1 pillar_1.9.0
[73] spam_2.10-0 RcppHNSW_0.6.0 later_1.3.2 splines_4.4.1
[77] dplyr_1.1.4 lattice_0.22-6 survival_3.7-0 bit_4.0.5
[81] deldir_2.0-4 tidyselect_1.2.1 Biostrings_2.72.1 miniUI_0.1.1.1
[85] pbapply_1.7-2 gridExtra_2.3 scattermore_1.2 statmod_1.5.0
[89] matrixStats_1.3.0 UCSC.utils_1.0.0 stringi_1.8.4 lazyeval_0.2.2
[93] codetools_0.2-20 tibble_3.2.1 cli_3.6.3 uwot_0.2.2
[97] xtable_1.8-4 reticulate_1.38.0 munsell_0.5.1 GenomeInfoDb_1.40.1
[101] Rcpp_1.0.12 globals_0.16.3 spatstat.random_3.2-3 png_0.1-8
[105] parallel_4.4.1 ggplot2_3.5.1 blob_1.2.4 dotCall64_1.1-1
[109] listenv_0.9.1 viridisLite_0.4.2 scales_1.3.0 ggridges_0.5.6
[113] crayon_1.5.3 leiden_0.4.3.1 purrr_1.0.2 rlang_1.1.4
[117] cowplot_1.1.3 KEGGREST_1.44.1

Thank you!

mhkowalski commented 2 months ago

Hi,

Thanks for reporting this issue. It may not be related to the JoinLayers function, as this is built to handle data with different features in the counts layer. For example,

> library(SeuratData)
> ifnb <- LoadData("ifnb")
Validating object structure
Updating object slots
Ensuring keys are in the proper structure
Warning: Assay RNA changing from Assay to Assay
Ensuring keys are in the proper structure
Ensuring feature names don't have underscores or pipes
Updating slots in RNA
Validating object structure for Assay 'RNA'
Object representation is consistent with the most current Seurat version
Warning: Assay RNA changing from Assay to Assay5
> length(setdiff(rownames(ifnb), rownames(pbmc_small)))
[1] 13823 #ifnb has many genes that aren't in pbmc_small
> m <- merge(ifnb, pbmc_small)
> m <- JoinLayers(m)
> dim(m[['RNA']]$counts) #join layers keeps all of the genes in both objects
[1] 14053 14079

Would you be able to try to reproduce this in data available through SeuratData, for instance? There might be something strange about data or data_list that is causing this error.

sylvia-science commented 2 months ago

Hello,

I ended up reinstalling Seurat and SeuratData before your response and it seems to have fixed the problem. The versions stayed the same though so I'm not sure what the issue was.

Here's a running version of your suggestion. I used pbmc3k instead of pbmc_small because I had issues loading pbmc_small.

library(SeuratData)
library(Seurat)
# options(timeout = 500) # InstallData fails otherwise
# InstallData("ifnb")
# InstallData("pbmc_small")

ifnb <- LoadData("ifnb")
pbmc3k <- LoadData("pbmc3k")

length(setdiff(rownames(ifnb), rownames(pbmc3k)))
1239 #ifnb has many genes that aren't in pbmc3k
m <- merge(ifnb, pbmc3k)
m <- JoinLayers(m)
dim(m[['RNA']]$counts) #join layers keeps all of the genes in both objects [1] 14953 16699

sessionInfo() R version 4.4.1 (2024-06-14) Platform: aarch64-apple-darwin20 Running under: macOS Sonoma 14.5

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Brussels tzcode source: internal

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] Seurat_5.1.0 SeuratObject_5.0.2 sp_2.1-4 pbmc3k.SeuratData_3.1.4 ifnb.SeuratData_3.1.0
[6] SeuratData_0.2.2.9001

loaded via a namespace (and not attached): [1] deldir_2.0-4 pbapply_1.7-2 gridExtra_2.3 rlang_1.1.4 magrittr_2.0.3
[6] RcppAnnoy_0.0.22 matrixStats_1.3.0 ggridges_0.5.6 compiler_4.4.1 spatstat.geom_3.2-9
[11] png_0.1-8 vctrs_0.6.5 reshape2_1.4.4 stringr_1.5.1 crayon_1.5.3
[16] pkgconfig_2.0.3 fastmap_1.2.0 utf8_1.2.4 promises_1.3.0 purrr_1.0.2
[21] jsonlite_1.8.8 goftest_1.2-3 later_1.3.2 spatstat.utils_3.0-5 irlba_2.3.5.1
[26] parallel_4.4.1 cluster_2.1.6 R6_2.5.1 ica_1.0-3 stringi_1.8.4
[31] RColorBrewer_1.1-3 spatstat.data_3.1-2 reticulate_1.38.0 parallelly_1.37.1 lmtest_0.9-40
[36] scattermore_1.2 Rcpp_1.0.12 tensor_1.5 future.apply_1.11.2 zoo_1.8-12
[41] sctransform_0.4.1 httpuv_1.6.15 Matrix_1.7-0 splines_4.4.1 igraph_2.0.3
[46] tidyselect_1.2.1 rstudioapi_0.16.0 abind_1.4-5 spatstat.random_3.2-3 codetools_0.2-20
[51] miniUI_0.1.1.1 spatstat.explore_3.2-7 listenv_0.9.1 lattice_0.22-6 tibble_3.2.1
[56] plyr_1.8.9 shiny_1.8.1.1 ROCR_1.0-11 Rtsne_0.17 future_1.33.2
[61] fastDummies_1.7.3 survival_3.7-0 polyclip_1.10-6 fitdistrplus_1.1-11 pillar_1.9.0
[66] KernSmooth_2.23-24 plotly_4.10.4 generics_0.1.3 RcppHNSW_0.6.0 ggplot2_3.5.1
[71] munsell_0.5.1 scales_1.3.0 globals_0.16.3 xtable_1.8-4 glue_1.7.0
[76] lazyeval_0.2.2 tools_4.4.1 data.table_1.15.4 RSpectra_0.16-1 RANN_2.6.1
[81] leiden_0.4.3.1 dotCall64_1.1-1 cowplot_1.1.3 grid_4.4.1 tidyr_1.3.1
[86] colorspace_2.1-0 nlme_3.1-165 patchwork_1.2.0 cli_3.6.3 rappdirs_0.3.3
[91] spatstat.sparse_3.1-0 spam_2.10-0 fansi_1.0.6 viridisLite_0.4.2 dplyr_1.1.4
[96] uwot_0.2.2 gtable_0.3.5 digest_0.6.36 progressr_0.14.0 ggrepel_0.9.5
[101] htmlwidgets_1.6.4 htmltools_0.5.8.1 lifecycle_1.0.4 httr_1.4.7 mime_0.12
[106] MASS_7.3-61