satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.24k stars 901 forks source link

Rename layers seurat v5 #7316

Closed aCompanionUnobtrusive closed 1 year ago

aCompanionUnobtrusive commented 1 year ago

Hello,

does anyone know if it is possible to rename layers in version 5 seurat objects? I merged numerous version 5 objects in a for loop, and in doing so it seems like the layers were renamed, and for every new object I merged, there was an additional .SeuratProject added to the previous object... leaving me with layers that are named like this:

"data.sampleID.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject"

I have tried renaming with the following strategies:

Layers(seur2[["RNA"]])[1] <- 'counts.sampleID'
Error in Layers(seur2[["RNA"]])[1] <- "counts.sampleID :
  could not find function "Layers<-"

and I have also tried using the full path:

seur2@assays$RNA@layers$counts.sampleID.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject <- 'counts.sampleID'

and tried names()

names(seur2@assays$RNA@layers$counts.sampleID.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject) <- 'counts.sampleID'

and colnames()

colnames(seur2@assays$RNA@layers$counts.sampleID.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject) <- 'counts.sampleID

I would be super grateful if someone can help. Thanks

yuhanH commented 1 year ago

hi @aCompanionUnobtrusive Actually, we don't have functions to rename layers. We will add this functionality soon. In your case, you can merge all layers and split again based on batch information. Then layer names will be meaningful. For example,

>library(Seurat)
>library(SeuratData)
>options(Seurat.object.assay.version = 'v5')
>obj <- LoadData(ds = 'pbmcsca')
> obj$random.group <- sample(1:10, size = ncol(obj), replace = T)
> obj[['RNA']] <-  split(x = obj[['RNA']], f = obj$random.group)
> Layers(obj[['RNA']])
 [1] "counts.2"  "counts.8"  "counts.1"  "counts.3"  "counts.6"  "counts.7"  "counts.4"  "counts.10" "counts.5"  "counts.9" 
[11] "data.2"    "data.8"    "data.1"    "data.3"    "data.6"    "data.7"    "data.4"    "data.10"   "data.5"    "data.9"   
> obj[['RNA']]  <- JoinLayers(object = obj[['RNA']]  )
> obj[['RNA']] <-  split(x = obj[['RNA']], f = obj$Method)
> Layers(obj[['RNA']])
 [1] "data.Smart-seq2"          "data.CEL-Seq2"            "data.10x_Chromium_v2_A"   "data.10x_Chromium_v2_B"  
 [5] "data.10x_Chromium_v3"     "data.Drop-seq"            "data.Seq-Well"            "data.inDrops"            
 [9] "data.10x_Chromium_v2"     "counts.Smart-seq2"        "counts.CEL-Seq2"          "counts.10x_Chromium_v2_A"
[13] "counts.10x_Chromium_v2_B" "counts.10x_Chromium_v3"   "counts.Drop-seq"          "counts.Seq-Well"         
[17] "counts.inDrops"           "counts.10x_Chromium_v2"  
aCompanionUnobtrusive commented 1 year ago

Hi @yuhanH thanks for your response. This indeed renames my layers nicely.

However, it also drastically increases the size of my seurat object...

> seur1[['RNA']]  <- JoinLayers(object = seur1[['RNA']]  )
> seur1
An object of class Seurat
32794 features across 344596 samples within 1 assay
Active assay: RNA (32794 features, 0 variable features)
 1 layer present: counts
> format(object.size(seur1), units = "GB")
[1] "0.6 Gb"
> seur1[['RNA']] <-  split(x = seur1[['RNA']], f = seur1$pID)
> format(object.size(seur1), units = "GB")
[1] "22.1 Gb"

Do you know why this might be happening?

mhkowalski commented 1 year ago

Sorry for our slow response on this. Can you please post the full code you're using to merge v5 objects in a for loop?

Thanks!

aCompanionUnobtrusive commented 1 year ago

Hi, thanks for your response @mhkowalski I am not having this issue anymore, and while I'm not completely sure what fixed it, I think it's that I had some inconsistencies in which layers were written to disk using BPCells. Thanks!

LucieLamothe commented 7 months ago

Hi,

It seems I'm having a similar issue, when merging different sample of the same dataset into one seurat object using a loop, the layers' name are modified with an addition of ".SeuratProject" at each iteration. I have no idea where this is coming from and how to prevent it or modify it afterwards. Help? :sweat_smile:

# set the data set id for all meta data and file calling
data_set_id <- "lin_w_2020"

# setting up the data directory
data_dir <- paste0("/home/lamothlu/Datas/deconv_pdac/test_data/",data_set_id,"/")

# capturing all file in variable, in the right order if the file are names correctly

mtx_files <- list.files(path = data_dir, pattern=paste0(data_set_id,"_pt.*_matrix.mtx"))
cells_files <- list.files(path = data_dir, pattern=paste0(data_set_id,"_pt.*_barcodes.tsv"))
features_files <- list.files(path = data_dir, pattern=paste0(data_set_id,"_pt.*_features.tsv"))

# Initialize seurat object for merge

mtx <-  paste0(data_dir, mtx_files[1])
cells <- paste0(data_dir,cells_files[1])
features <- paste0(data_dir, features_files[1])
counts <- ReadMtx(mtx = mtx, cells = cells, features = features)
lin <- CreateSeuratObject(counts = counts, project = paste0(data_set_id,"_1"),  min.cells = 3, min.features = 200)

# Merge all patients/sample into one seurat object

for (i in 2:length(mtx_files)) {
mtx <-  paste0(data_dir, mtx_files[i])
mtx
cells <- paste0(data_dir,cells_files[i])
cells
features <- paste0(data_dir, features_files[i])
features
counts <- ReadMtx(mtx = mtx, cells = cells, features = features)
SeuratObject <- CreateSeuratObject(counts = counts, project = paste0(data_set_id,"_",i), min.cells = 3, min.features = 200) #project var sets the identification meta data 

lin <- merge(x = lin ,y = SeuratObject)
}

lin

21379 features across 3992 samples within 1 assay Active assay: RNA (21379 features, 0 variable features) 5 layers present: counts.lin_w_2020_1.SeuratProject.SeuratProject.SeuratProject, counts.lin_w_2020_2.SeuratProject.SeuratProject.SeuratProject, counts.lin_w_2020_3.SeuratProject.SeuratProject, counts.lin_w_2020_4.SeuratProject, counts.lin_w_2020_5

nick-youngblut commented 3 months ago

I'm running into this same issue. My list of directories containing the input files:

/large_experiments/multiomics/SspArc0008_10x_cDNA_longRead//wf-single-cell/expectCell8k/B16_Bl6_dKO_lung_2
/large_experiments/multiomics/SspArc0008_10x_cDNA_longRead//wf-single-cell/expectCell8k/B16_Bl6_dKO_lung_3
/large_experiments/multiomics/SspArc0008_10x_cDNA_longRead//wf-single-cell/expectCell8k/B16_Bl6_ENPP3KO_lung_1

My code to load and merge:

# function to create a Seurat objects for each sample
create_seurat = function(sample_dir) {
  D = file.path(sample_dir, "gene_processed_feature_bc_matrix")
  seurat_obj = Read10X(D) %>% 
    CreateSeuratObject(counts = ., project = basename(sample_dir)) %>%
    RenameCells(., add.cell.id = basename(sample_dir))
  return(seurat_obj)
}

# create Seurat objects for all samples
seurat_obj = sample_dirs %>%
    lapply(create_seurat) %>%
    Reduce(function(x, y) merge(x, y), .)
seurat_obj[["RNA"]]

The output:

Layers:
 counts.B16_Bl6_dKO_lung_2.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject,
counts.B16_Bl6_dKO_lung_3.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject,
counts.B16_Bl6_ENPP3KO_lung_1.SeuratProject.SeuratProject.SeuratProject.SeuratProject.SeuratProject,
counts.B16_Bl6_ENPP3KO_lung_2.SeuratProject.SeuratProject.SeuratProject.SeuratProject,
counts.B16_Bl6_H329A_lung_1.SeuratProject.SeuratProject.SeuratProject,
counts.B16_Bl6_H329A_lung_2.SeuratProject.SeuratProject,
counts.B16_Bl6_WT_lung_1.SeuratProject, counts.B16_Bl6_WT_lung_2 

Reduce + merge is progressively adding .SeuratProject.

So, I wouldn't necessarily consider this issue "Closed".

More generally, it would be quite helpful to quickly rename layers, if the user needs to do so; versus currently renaming via something like:

names(seurat_obj[["RNA"]]@layers) = gsub("\\.SeuratProject.*$", "", names(seurat_obj[["RNA"]]@layers))

Is VERY slow and requires a lot of memory.

sessionInfo

R version 4.3.3 (2024-02-29)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 22.04.3 LTS

Matrix products: default
BLAS/LAPACK: /home/nickyoungblut/miniforge3/envs/seurat-v5/lib/libopenblasp-r0.3.27.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] future_1.33.2      ArcRUtils_0.1.0    Seurat_5.1.0       SeuratObject_5.0.2
[5] sp_2.1-4           biomaRt_2.58.0     ggplot2_3.5.1      tidyr_1.3.1       
[9] dplyr_1.1.4       

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3      jsonlite_1.8.8          magrittr_2.0.3         
  [4] spatstat.utils_3.0-4    zlibbioc_1.48.0         vctrs_0.6.5            
  [7] ROCR_1.0-11             spatstat.explore_3.2-6  memoise_2.0.1          
 [10] RCurl_1.98-1.14         base64enc_0.1-3         htmltools_0.5.8.1      
 [13] progress_1.2.3          curl_5.1.0              sctransform_0.4.1      
 [16] parallelly_1.37.1       KernSmooth_2.23-24      htmlwidgets_1.6.4      
 [19] ica_1.0-3               plyr_1.8.9              plotly_4.10.4          
 [22] zoo_1.8-12              cachem_1.1.0            uuid_1.2-0             
 [25] igraph_2.0.3            mime_0.12               lifecycle_1.0.4        
 [28] pkgconfig_2.0.3         Matrix_1.6-5            R6_2.5.1               
 [31] fastmap_1.2.0           GenomeInfoDbData_1.2.11 fitdistrplus_1.1-11    
 [34] shiny_1.8.1.1           digest_0.6.35           colorspace_2.1-0       
 [37] patchwork_1.2.0         AnnotationDbi_1.64.1    S4Vectors_0.40.2       
 [40] tensor_1.5              RSpectra_0.16-1         irlba_2.3.5.1          
 [43] RSQLite_2.3.4           filelock_1.0.3          progressr_0.14.0       
 [46] spatstat.sparse_3.0-3   fansi_1.0.6             polyclip_1.10-6        
 [49] abind_1.4-5             httr_1.4.7              compiler_4.3.3         
 [52] bit64_4.0.5             withr_3.0.0             DBI_1.2.3              
 [55] fastDummies_1.7.3       MASS_7.3-60             rappdirs_0.3.3         
 [58] tools_4.3.3             lmtest_0.9-40           httpuv_1.6.15          
 [61] future.apply_1.11.2     goftest_1.2-3           glue_1.7.0             
 [64] nlme_3.1-164            promises_1.3.0          grid_4.3.3             
 [67] pbdZMQ_0.3-11           Rtsne_0.17              reshape2_1.4.4         
 [70] cluster_2.1.6           generics_0.1.3          spatstat.data_3.0-4    
 [73] gtable_0.3.5            data.table_1.15.2       hms_1.1.3              
 [76] xml2_1.3.6              utf8_1.2.4              XVector_0.42.0         
 [79] spatstat.geom_3.2-9     BiocGenerics_0.48.1     RcppAnnoy_0.0.22       
 [82] ggrepel_0.9.5           RANN_2.6.1              pillar_1.9.0           
 [85] stringr_1.5.1           spam_2.10-0             IRdisplay_1.1          
 [88] RcppHNSW_0.6.0          later_1.3.2             splines_4.3.3          
 [91] BiocFileCache_2.10.1    lattice_0.22-6          deldir_2.0-4           
 [94] survival_3.6-4          bit_4.0.5               tidyselect_1.2.1       
 [97] Biostrings_2.70.1       miniUI_0.1.1.1          pbapply_1.7-2          
[100] gridExtra_2.3           IRanges_2.36.0          scattermore_1.2        
[103] stats4_4.3.3            Biobase_2.62.0          matrixStats_1.3.0      
[106] stringi_1.8.4           lazyeval_0.2.2          evaluate_0.23          
[109] codetools_0.2-20        tibble_3.2.1            cli_3.6.2              
[112] uwot_0.1.16             IRkernel_1.3.2          xtable_1.8-4           
[115] reticulate_1.37.0       repr_1.1.7              munsell_0.5.1          
[118] Rcpp_1.0.12             GenomeInfoDb_1.38.1     spatstat.random_3.2-3  
[121] globals_0.16.3          dbplyr_2.5.0            png_0.1-8              
[124] XML_3.99-0.16.1         parallel_4.3.3          blob_1.2.4             
[127] prettyunits_1.2.0       dotCall64_1.1-1         bitops_1.0-7           
[130] listenv_0.9.1           viridisLite_0.4.2       scales_1.3.0           
[133] ggridges_0.5.6          leiden_0.4.3.1          purrr_1.0.2            
[136] crayon_1.5.2            rlang_1.1.3             cowplot_1.1.3          
[139] KEGGREST_1.42.0
aaltulea commented 1 week ago

this code below fixes the annoying SeuratObject suffixes in an unprocessed object

names(seurat_tables@assays$RNA@layers) =
  names(seurat_tables@assays$RNA@layers) %>%
    stringr::str_remove_all(pattern = "SeuratProject|\\.|\\s") %>%
    stringr::str_replace_all("counts", "counts.") # might need to add "data" as well

colnames(seurat_tables@assays$RNA@cells) = names(seurat_tables@assays$RNA@layers)
colnames(seurat_tables@assays$RNA@features) = names(seurat_tables@assays$RNA@layers)