satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.29k stars 915 forks source link

Error when split() and IntegrateLayers() in Seurat v5 #7656

Closed Kexin118 closed 4 months ago

Kexin118 commented 1 year ago

Hi

I follow the Seurat V5 Vignette Using BPCells with Seurat Objects to load 10 Cell Ranger filtered h5 files. Due to the vignette describing loading h5ad files rather than h5, I encountered some issues during loading and analysis.

Here is my code

file.dir <- "CellRangerResults/Guilliams/"
files.set <- c("gf1_filtered_feature_bc_matrix.h5", "gf2_filtered_feature_bc_matrix.h5", "gf3_filtered_feature_bc_matrix.h5",  "gf4_filtered_feature_bc_matrix.h5", "gf5_filtered_feature_bc_matrix.h5", "gf6_filtered_feature_bc_matrix.h5", "gh1_filtered_feature_bc_matrix.h5", "gh2_filtered_feature_bc_matrix.h5", "gh3_filtered_feature_bc_matrix.h5", "gh4_filtered_feature_bc_matrix.h5")

# Loop through h5 files and output BPCells matrices on-disk
data.list <- c()
metadata.list <- c()

for (i in 1:length(files.set)) {
  path <- paste0(file.dir, files.set[i])
  data <- open_matrix_10x_hdf5(path)
  write_matrix_dir(
    mat = data,
    dir = paste0(gsub(".h5", "", path), "_BP"))
  # Load in BP matrices
  mat <- open_matrix_dir(dir = paste0(gsub(".h5", "", path), "_BP"))
  mat <- Azimuth:::ConvertEnsembleToSymbol(mat = mat, species = "human")
  mat1 <- CreateSeuratObject(counts = mat) 

  # Get metadata
  metadata.list[[i]] <- mat1[[]] 
  data.list[[i]] <- mat
}
# Name layers
names(data.list) <- c("gf1", "gf2", "gf3","gf4","gf5","gf6","gh1","gh2","gh3","gh4")

In the for loop, I use

mat1 <- CreateSeuratObject(counts = mat) 

Instead of

metadata.list[[i]] <- LoadH5ADobs(path = path)

To get the metadata list because I don't know how to read in only the metadata of an H5 file and return a data.frame object (Is there any better way rather than create a temporary Seurat object???). I think all of the following problem might start from here.

Then I add the sample information to the metadata

# read the sampledata file
SampleInfo <- read.csv(file = "/SampleMetadata.csv")

# Add Metadata
for (i in 1:length(metadata.list)) {
  metadata.list[[i]]$Sample <- names(data.list)[i]
  metadata.list[[i]]$Patient <- SampleInfo$Patient[i]
  metadata.list[[i]]$Gender <- SampleInfo$Gender[i]
  metadata.list[[i]]$Age <- SampleInfo$Age[i]
  metadata.list[[i]]$Aetiology <- SampleInfo$Aetiology[i]
  metadata.list[[i]]$Original_SampleID <- SampleInfo$Original_SampleID[i]
  metadata.list[[i]]$Cirrhosis <- SampleInfo$Cirrhosis[i]
  metadata.list[[i]]$Experiment_Group <- SampleInfo$Experiment_Group[i]
  metadata.list[[i]]$Fibrosis_Stage <- SampleInfo$Fibrosis_Stage[i]
}
metadata <- Reduce(rbind, metadata.list)

Create and save Seurat object

options(Seurat.object.assay.version = "v5")
ghcgfcV5m <- CreateSeuratObject(counts = data.list, meta.data = metadata)
ghcgfcV5m

saveRDS(
  object = ghcgfcV5m,
  file = "ghcgfc_mergedV5.Rds" #,
  #destdir = "/Robject/ghcgfcV5merged"
)
ghcgfcV5m
An object of class Seurat 
25250 features across 80580 samples within 1 assay 
Active assay: RNA (25250 features, 0 variable features)
 10 layers present: counts.gf1, counts.gf2, counts.gf3, counts.gf4, counts.gf5, counts.gf6, counts.gh1, counts.gh2, counts.gh3, counts.gh4

I read #7192 found that save the direction may bring problem so I only saved the '.Rds' file.

I can JoinLayers but can't split the object:

ghcgfcV5m <- JoinLayers(ghcgfcV5m)
ghcgfcV5m
An object of class Seurat 
25250 features across 80580 samples within 1 assay 
Active assay: RNA (25250 features, 0 variable features)
 1 layer present: counts

ghcgfcV5m[["RNA"]] <- split(ghcgfcV5m[["RNA"]], f = ghcgfcV5m$Sample)
Error in split.Assay5(ghcgfcV5m[["RNA"]], f = ghcgfcV5m$Sample) : 
  Those layers are splitted already: counts
Please join those layers before splitting

If I don't do either JoinLayers or split, the following all work until integration:

ghcgfcV5m <- NormalizeData(ghcgfcV5m,verbose = F)
ghcgfcV5m <- FindVariableFeatures(ghcgfcV5m, verbose = F)
ghcgfcV5m <- SketchData(
  object = ghcgfcV5m,
  ncells = 5000,
  method = "LeverageScore",
  sketched.assay = "sketch")
DefaultAssay(ghcgfcV5m) <- "sketch"
ghcgfcV5m <- FindVariableFeatures(ghcgfcV5m,verbose=F)
ghcgfcV5m <- ScaleData(ghcgfcV5m,verbose=F)
ghcgfcV5m <- RunPCA(ghcgfcV5m,verbose=F)

Now the object looks like this

ghcgfcV5m
An object of class Seurat 
50500 features across 80580 samples within 2 assays 
Active assay: sketch (25250 features, 2000 variable features)
 21 layers present: counts.gf1, counts.gf2, counts.gf3, counts.gf4, counts.gf5, counts.gf6, counts.gh1, counts.gh2, counts.gh3, counts.gh4, data.gf1, data.gf2, data.gf3, data.gf4, data.gf5, data.gf6, data.gh1, data.gh2, data.gh3, data.gh4, scale.data
 1 other assay present: RNA
 1 dimensional reduction calculated: pca

Integration

ghcgfcV5m <- IntegrateLayers(
  object = ghcgfcV5m, method = RPCAIntegration,
  orig.reduction = "pca", new.reduction = "integrated.rpca",
  verbose = F)

Error in `IntegrateLayers()`:
! 'group.by' must correspond to a column of cell-level meta data
Backtrace:
 1. Seurat::IntegrateLayers(...)

Traceback

<div class="GND-IWGDDID ace_constant ace_language" style="display: inline; overflow-wrap: break-word; word-break: break-word; color: rgb(255, 156, 0); font-family: Monaco, monospace; font-size: 20px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-line; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;"><span>Error in IntegrateLayers(object = ghcgfcV5m, method = RPCAIntegration, :</span></div><span style="color: rgb(64, 64, 64); font-family: Monaco, monospace; font-size: 20px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-line; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(236, 236, 236); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: inline !important; float: none;"></span><div class="GND-IWGDEID stack_trace" aria-hidden="true" style="margin-top: 0.5em; margin-left: 0.3em; color: rgb(64, 64, 64); font-family: Monaco, monospace; font-size: 20px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-line; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; display: block;">

4. | stop(fallback)
-- | --

</div>

I read #7318 and installed the newest version of Seurat by

remotes::install_github("mojaveazure/seurat-object", "seurat5", quiet = TRUE)
remotes::install_github("satijalab/seurat", "seurat5", quiet = TRUE)

but the error is still there.

Thank you!!!

Gesmira commented 1 year ago

Hi @Kexin118, When you installed the new version of Seurat, did you clear and restart your R session as well?

ClaireZ1212 commented 1 year ago

Hi, I have a similar issue. I updated Seurat v5 this morning but got the same problem.

1, Error in IntegrateLayers()

library(Seurat)
library(SeuratData)
library(BPCells)
library(dplyr)
library(SeuratWrappers)
library(patchwork)
library(ggrepel)
options(future.globals.maxSize = 3e+09)
options(Seurat.object.assay.version = "v5")
xxx
obj <- IntegrateLayers(
    object = obj, method = scVIIntegration,
    new.reduction = "integrated.scvi",
    conda_env = "/my/miniconda/envs/scvi-env",
    verbose = FALSE
)

Error in `IntegrateLayers()`:
! 'group.by' must correspond to a column of cell-level meta data
Backtrace:
    ▆
 1. └─Seurat::IntegrateLayers(...)
 2.   └─rlang::abort(message = "'group.by' must correspond to a column of cell-
Execution halted

2, Error: vector::reserve

If I add "JoinLayers() and split()" before IntegrateLayers(), IntegrateLayers() works for small datasets but shows another error for large datasets (e.g. ~1 million cell dataset, average number of genes in each cell ~3000):

DefaultAssay(obj) <- "sketch"
obj <- FindVariableFeatures(obj)
obj <- ScaleData(obj)
obj <- RunPCA(obj)
obj[["sketch"]] <- JoinLayers(obj[["sketch"]])   ## Fail
obj[["sketch"]] <- split(obj[["sketch"]], f = obj$Sample)
obj <- IntegrateLayers(...)

#output
PC_ 5
Positive:  xxxx
Negative:  xxxx
Error: vector::reserve
Execution halted
rm: cannot remove ‘xxx' : Device or resource busy

3, sessionInfo() info

R version 4.2.0 (2022-04-22) Platform: x86_64-conda-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS/LAPACK: /libopenblasp-r0.3.23.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] ggrepel_0.9.3 ggplot2_3.4.2 patchwork_1.1.2 [4] SeuratWrappers_0.3.19 dplyr_1.1.2 BPCells_0.1.0 [7] pbmcsca.SeuratData_3.0.0 pbmcref.SeuratData_1.0.0 SeuratData_0.2.2.9001 [10] Seurat_4.9.9.9059 SeuratObject_4.9.9.9091 sp_2.0-0

loaded via a namespace (and not attached): [1] Rtsne_0.16 colorspace_2.1-0 deldir_1.0-9 [4] ellipsis_0.3.2 ggridges_0.5.4 XVector_0.38.0 [7] GenomicRanges_1.50.0 RcppHNSW_0.4.1 spatstat.data_3.0-1 [10] leiden_0.4.3 listenv_0.9.0 remotes_2.4.2 [13] RSpectra_0.16-1 fansi_1.0.4 R.methodsS3_1.8.2 [16] codetools_0.2-19 splines_4.2.0 polyclip_1.10-4 [19] spam_2.9-1 jsonlite_1.8.7 ica_1.0-3 [22] cluster_2.1.4 R.oo_1.25.0 png_0.1-8 [25] uwot_0.1.16 shiny_1.7.4.1 sctransform_0.3.5 [28] spatstat.sparse_3.0-2 BiocManager_1.30.21 compiler_4.2.0 [31] httr_1.4.6 Matrix_1.5-4.1 fastmap_1.1.1 [34] lazyeval_0.2.2 cli_3.6.1 later_1.3.1 [37] htmltools_0.5.5 tools_4.2.0 rsvd_1.0.5 [40] igraph_1.3.5 dotCall64_1.0-2 GenomeInfoDbData_1.2.9 [43] gtable_0.3.3 glue_1.6.2 RANN_2.6.1 [46] reshape2_1.4.4 rappdirs_0.3.3 Rcpp_1.0.11 [49] scattermore_1.2 vctrs_0.6.3 spatstat.explore_3.2-1 [52] nlme_3.1-162 progressr_0.13.0 lmtest_0.9-40 [55] spatstat.random_3.1-5 stringr_1.5.0 globals_0.16.2 [58] mime_0.12 miniUI_0.1.1.1 lifecycle_1.0.3 [61] irlba_2.3.5.1 goftest_1.2-3 future_1.33.0 [64] zlibbioc_1.44.0 MASS_7.3-60 zoo_1.8-12 [67] scales_1.2.1 promises_1.2.0.1 spatstat.utils_3.0-3 [70] parallel_4.2.0 RColorBrewer_1.1-3 reticulate_1.30 [73] pbapply_1.7-2 gridExtra_2.3 stringi_1.7.12 [76] S4Vectors_0.36.2 fastDummies_1.7.3 BiocGenerics_0.44.0 [79] GenomeInfoDb_1.34.9 bitops_1.0-7 rlang_1.1.1 [82] pkgconfig_2.0.3 matrixStats_1.0.0 lattice_0.21-8 [85] ROCR_1.0-11 purrr_1.0.1 tensor_1.5 [88] htmlwidgets_1.6.2 cowplot_1.1.1 tidyselect_1.2.0 [91] parallelly_1.36.0 RcppAnnoy_0.0.20 plyr_1.8.8 [94] magrittr_2.0.3 R6_2.5.1 IRanges_2.32.0 [97] generics_0.1.3 withr_2.5.0 pillar_1.9.0 [100] fitdistrplus_1.1-11 RCurl_1.98-1.12 survival_3.5-5 [103] abind_1.4-5 tibble_3.2.1 future.apply_1.11.0 [106] crayon_1.5.2 KernSmooth_2.23-21 utf8_1.2.3 [109] spatstat.geom_3.2-4 plotly_4.10.2 grid_4.2.0 [112] data.table_1.14.8 digest_0.6.31 xtable_1.8-4 [115] tidyr_1.3.0 httpuv_1.6.11 R.utils_2.12.2 [118] stats4_4.2.0 munsell_0.5.0 viridisLite_0.4.2

Is there anything wrong with my packages?

Thanks for your help! Best, Claire

sentisci commented 1 year ago

I have encountered the same exact issue.. Has anyone resolved the above issue?

ClaireZ1212 commented 1 year ago

One more issue in Step. Extend results to the full datasets :

## Extend results to the full datasets
obj <- ProjectIntegration(object = obj, sketched.assay = "sketch", assay = "RNA", reduction = "integrated.mnn")
## Error
Warning: Changing path in object to point to new BPCells directory location
966 atomic cells identified in the sketched.assay
Correcting embeddings
Error in UnSketchEmbeddings(atom.data = LayerData(object = object[[sketched.assay]],  :
  fetures in atom.data and orig.data are not identical
Calls: ProjectIntegration -> UnSketchEmbeddings
Execution halted

This error can be fixed by adding JoinLayers() and split() before ProjectIntegration()

obj[["RNA"]] <- JoinLayers(obj[["RNA"]]) 
obj[["RNA"]] <- split(obj[["RNA"]], f = obj$Sample)
obj <- ProjectIntegration(object = obj, sketched.assay = "sketch", assay = "RNA", reduction = "integrated.mnn")
obj <- ProjectData(object = obj, sketched.assay = "sketch", assay = "RNA", sketched.reduction = "integrated.mnn.full",full.reducti    on = "integrated.mnn.full", dims = 1:50, refdata = list(cluster_full = "seurat_clusters"))

#done

But it seems that Suerat v5 JoinLayers() is not available for large datasets now (See 2, Error: vector::reserve or #4752 ).

dcollins15 commented 4 months ago

Thanks for using Seurat!

It appears that this issue has gone stale. In an effort to keep our Issues board from getting more unruly than it already is, we’re going to begin closing out issues that haven’t had any activity since the release of v4.4.0.

If this issue is still relevant we strongly encourage you to reopen or repost it, especially if you didn’t initially receive a response from us.