stemangiola / tidyseurat

Seurat meets tidyverse. The best of both worlds.
https://stemangiola.github.io/tidyseurat/
158 stars 12 forks source link

Error when attempting to merge Seurat Objects after filtering #59

Open william-hutchison opened 1 year ago

william-hutchison commented 1 year ago

Hello,

I am encountering an error when trying to merge two Seurat Objects after using tidyseurat's filter(). The error can only be produced when the attached data's "integrated" assay is present.

The error is produced by calling merge() with the following code:

library("Seurat")
library("tidyseurat")

pbmc_complex <-
  readRDS("pbmc_complex.rds")

pbmc_complex_filtered <-
  pbmc_complex |>
  tidyseurat::filter(sample %in% c("SI-GA-G9", "SI-GA-G6"))

pbmc_complex_filtered_split <- 
  SplitObject(pbmc_complex_filtered, "sample")

pbmc_complex_filtered_merged <- 
  merge(pbmc_complex_filtered_split[[1]], pbmc_complex_filtered_split[[2]])

The error message is:

Error in names(model.list) <- all.levels : 
  attempt to set an attribute on NULL

The pbmc_complex_filtered_split object looks like:

$`SI-GA-G9`
# A Seurat-tibble abstraction: 14 × 111
# Features=59856 | Cells=14 | Active assay=RNA | Assays=RNA, integrated,
#  prediction.score.celltype.l1, prediction.score.celltype.l2, predicted_ADT,
#  prediction.score.curated_cell_type, prediction.score.curated_cell_type_pretty
   .cell       Barcode race  sex   chemi…¹ note  batch BCB   type  DOB   date.…² Sampl…³ Stage…⁴
   <chr>       <chr>   <chr> <chr> <lgl>   <chr> <chr> <chr> <chr> <chr> <chr>   <chr>   <chr>  
 1 8_AGACACTT… AGACAC… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov… 2 8_TTCCGGTT… TTCCGG… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov… 3 8_GGCACGTT… GGCACG… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov… 4 8_ATCACAGA… ATCACA… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov… 5 8_CGCATGGA… CGCATG… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov… 6 8_TACTTACT… TACTTA… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov… 7 8_CAAGCTAG… CAAGCT… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov… 8 8_CATTCCGC… CATTCC… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov… 9 8_CATTCTAG… CATTCT… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov…
10 8_ATCCCTGG… ATCCCT… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov…
11 8_CTGCGAGT… CTGCGA… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov…
12 8_AGATAGAC… AGATAG… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov…
13 8_TCTACCGG… TCTACC… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov…
14 8_TATATCCC… TATATC… white NA    NA      NA    3     BCB0… MBC   6/11… 11/12/… 4/6/20… De nov…
# … with 98 more variables: intrinsic.subtype <chr>,
#   STAGE.WHEN.BIOPSY.TAKEN..EBC.vs..MBC. <chr>,
#   Treatment.response.at.time.sample.taken..progressing..responding..stable.disease. <chr>,
#   menopausal.status..premenopausal..perimenopausal..postmenopausal. <chr>,
#   neoadjuvant.therapy..specified. <chr>, radiotherapy..y.n. <chr>,
#   endocrine.therapy..y.n. <chr>, her2.targeted.therapy..y.n. <chr>,
#   adjuvant.chemotherapy..y.n. <chr>, …
# ℹ Use `colnames()` to see all variable names

$`SI-GA-G6`
# A Seurat-tibble abstraction: 6 × 111
# Features=59856 | Cells=6 | Active assay=RNA | Assays=RNA, integrated,
#  prediction.score.celltype.l1, prediction.score.celltype.l2, predicted_ADT,
#  prediction.score.curated_cell_type, prediction.score.curated_cell_type_pretty
  .cell        Barcode race  sex   chemi…¹ note  batch BCB   type  DOB   date.…² Sampl…³ Stage…⁴
  <chr>        <chr>   <chr> <chr> <lgl>   <chr> <chr> <chr> <chr> <chr> <chr>   <chr>   <chr>  
1 4_ATATCCTGT… ATATCC… white NA    NA      NA    2     BCB0… OMBC  13/0… 01/07/… 1/5/20… WLE + …
2 4_GGGAGTACA… GGGAGT… white NA    NA      NA    2     BCB0… OMBC  13/0… 01/07/… 1/5/20… WLE + …
3 4_CCCTGATAG… CCCTGA… white NA    NA      NA    2     BCB0… OMBC  13/0… 01/07/… 1/5/20… WLE + …
4 4_CGCATAATC… CGCATA… white NA    NA      NA    2     BCB0… OMBC  13/0… 01/07/… 1/5/20… WLE + …
5 4_GGATCTAAG… GGATCT… white NA    NA      NA    2     BCB0… OMBC  13/0… 01/07/… 1/5/20… WLE + …
6 4_CGTTCTGTC… CGTTCT… white NA    NA      NA    2     BCB0… OMBC  13/0… 01/07/… 1/5/20… WLE + …
# … with 98 more variables: intrinsic.subtype <chr>,
#   STAGE.WHEN.BIOPSY.TAKEN..EBC.vs..MBC. <chr>,
#   Treatment.response.at.time.sample.taken..progressing..responding..stable.disease. <chr>,
#   menopausal.status..premenopausal..perimenopausal..postmenopausal. <chr>,
#   neoadjuvant.therapy..specified. <chr>, radiotherapy..y.n. <chr>,
#   endocrine.therapy..y.n. <chr>, her2.targeted.therapy..y.n. <chr>,
#   adjuvant.chemotherapy..y.n. <chr>, …
# ℹ Use `colnames()` to see all variable names

Merging with the same data works fine without filtering:

pbmc_complex_split <-
  SplitObject(pbmc_complex, "sample")

pbmc_complex_merged <-
  merge(pbmc_complex_split[[1]], pbmc_complex_split[[2]])

And merging after filtering works fine when the "integrated" assay is removed:

pbmc_complex[['integrated']] <- 
  NULL

pbmc_complex_filtered <-
  pbmc_complex |>
  tidyseurat::filter(sample %in% c("SI-GA-G9", "SI-GA-G6"))

pbmc_complex_filtered_split <- 
  SplitObject(pbmc_complex_filtered, "sample")

pbmc_complex_filtered_merged <- 
  merge(pbmc_complex_filtered_split[[1]], pbmc_complex_filtered_split[[2]])

sessionInfo():

R version 4.2.1 (2022-06-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS:   /stornext/System/data/apps/R/R-4.2.1/lib64/R/lib/libRblas.so
LAPACK: /stornext/System/data/apps/R/R-4.2.1/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] tidyHeatmap_1.9.2  cowplot_1.1.1      ggrepel_0.9.3      sccomp_1.2.1      
 [5] tidyseurat_0.5.9   ttservice_0.2.2    SeuratObject_4.1.3 Seurat_4.3.0      
 [9] forcats_1.0.0      stringr_1.5.0      dplyr_1.1.1        purrr_1.0.1       
[13] readr_2.1.4        tidyr_1.3.0        tibble_3.2.1       ggplot2_3.4.2     
[17] tidyverse_1.3.2   

loaded via a namespace (and not attached):
  [1] utf8_1.2.3                  spatstat.explore_3.1-0      reticulate_1.28            
  [4] tidyselect_1.2.0            htmlwidgets_1.6.2           grid_4.2.1                 
  [7] Rtsne_0.16                  munsell_0.5.0               codetools_0.2-18           
 [10] ica_1.0-3                   future_1.32.0               miniUI_0.1.1.1             
 [13] withr_2.5.0                 spatstat.random_3.1-4       colorspace_2.1-0           
 [16] progressr_0.13.0            Biobase_2.58.0              rstudioapi_0.14            
 [19] stats4_4.2.1                SingleCellExperiment_1.20.1 ROCR_1.0-11                
 [22] tensor_1.5                  listenv_0.9.0               MatrixGenerics_1.10.0      
 [25] labeling_0.4.2              rstan_2.21.8                GenomeInfoDbData_1.2.9     
 [28] polyclip_1.10-4             farver_2.1.1                parallelly_1.35.0          
 [31] vctrs_0.6.1                 generics_0.1.3              timechange_0.2.0           
 [34] doParallel_1.0.17           R6_2.5.1                    GenomeInfoDb_1.34.9        
 [37] clue_0.3-64                 bitops_1.0-7                spatstat.utils_3.0-2       
 [40] DelayedArray_0.24.0         promises_1.2.0.1            scales_1.2.1               
 [43] googlesheets4_1.1.0         gtable_0.3.1                globals_0.16.2             
 [46] processx_3.8.1              goftest_1.2-3               rlang_1.1.0                
 [49] systemfonts_1.0.4           GlobalOptions_0.1.2         splines_4.2.1              
 [52] lazyeval_0.2.2              gargle_1.4.0                spatstat.geom_3.1-0        
 [55] broom_1.0.4                 inline_0.3.19               reshape2_1.4.4             
 [58] abind_1.4-5                 modelr_0.1.11               backports_1.4.1            
 [61] httpuv_1.6.9                tools_4.2.1                 ellipsis_0.3.2             
 [64] RColorBrewer_1.1-3          BiocGenerics_0.44.0         ggridges_0.5.4             
 [67] Rcpp_1.0.10                 plyr_1.8.8                  zlibbioc_1.44.0            
 [70] RCurl_1.98-1.12             ps_1.7.5                    prettyunits_1.1.1          
 [73] deldir_1.0-6                viridis_0.6.2               GetoptLong_1.0.5           
 [76] pbapply_1.7-0               S4Vectors_0.36.2            zoo_1.8-12                 
 [79] SummarizedExperiment_1.28.0 haven_2.5.2                 cluster_2.1.3              
 [82] fs_1.6.1                    magrittr_2.0.3              data.table_1.14.8          
 [85] scattermore_0.8             circlize_0.4.15             lmtest_0.9-40              
 [88] reprex_2.0.2                RANN_2.6.1                  googledrive_2.1.0          
 [91] fitdistrplus_1.1-8          matrixStats_0.63.0          hms_1.1.3                  
 [94] patchwork_1.1.2             mime_0.12                   xtable_1.8-4               
 [97] readxl_1.4.2                shape_1.4.6                 IRanges_2.32.0             
[100] gridExtra_2.3               rstantools_2.3.1            compiler_4.2.1             
[103] KernSmooth_2.23-20          crayon_1.5.1                StanHeaders_2.21.0-7       
[106] htmltools_0.5.5             later_1.3.0                 tzdb_0.3.0                 
[109] RcppParallel_5.1.7          lubridate_1.9.2             DBI_1.1.3                  
[112] ComplexHeatmap_2.14.0       dbplyr_2.3.2                MASS_7.3-57                
[115] boot_1.3-28                 Matrix_1.5-3                cli_3.6.1                  
[118] parallel_4.2.1              igraph_1.4.2                GenomicRanges_1.50.2       
[121] pkgconfig_2.0.3             sp_1.6-0                    plotly_4.10.1              
[124] spatstat.sparse_3.0-1       foreach_1.5.2               xml2_1.3.3                 
[127] XVector_0.38.0              rvest_1.0.3                 callr_3.7.3                
[130] digest_0.6.31               sctransform_0.3.5           RcppAnnoy_0.0.20           
[133] spatstat.data_3.0-1         cellranger_1.1.0            leiden_0.4.3               
[136] dendextend_1.17.1           uwot_0.1.14                 shiny_1.7.4                
[139] rjson_0.2.21                lifecycle_1.0.3             nlme_3.1-157               
[142] jsonlite_1.8.4              viridisLite_0.4.1           limma_3.54.2               
[145] fansi_1.0.4                 pillar_1.8.1                lattice_0.20-45            
[148] loo_2.6.0                   fastmap_1.1.1               httr_1.4.5                 
[151] pkgbuild_1.4.0              survival_3.3-1              glue_1.6.2                 
[154] iterators_1.0.14            png_0.1-8                   stringi_1.7.12             
[157] irlba_2.3.5.1               future.apply_1.10.0

I have attached the data used to produce this error.

Let me know if I can provide any more information. Thank you!

pbmc_complex.rds.zip

william-hutchison commented 1 year ago

I just encountered an open issue in Seurat which could be related: https://github.com/satijalab/seurat/issues/6462

stemangiola commented 1 year ago

Hello William,

would you like to tackle this issue?

william-hutchison commented 1 year ago

Hi Stefano,

Sure, I will investigate.

iza-mcac commented 5 months ago

Hello all I was looking to report a similar bug with a different error, but also in the merge function in tidyseurat. I have two Seurat-tibble abstraction objects that were both processed. But one was processed and saved as an .rds and re-loaded and other processed at the moment

# A Seurat-tibble abstraction: 3,258 × 14
# Features=36601 | Cells=3258 | Active assay=RNA | Assays=RNA, HTO
   .cell  orig.ident nCount_RNA nFeature_RNA percent.mt percent.mt.log2 nCount_HTO
   <chr>  <fct>           <dbl>        <int>      <dbl>           <dbl>      <dbl>
 1 AAACC… SeuratPro…      19558         4585          0            -Inf       6443
 2 AAACC… SeuratPro…        572          434          0            -Inf        291
 3 AAACC… SeuratPro…        684          528          0            -Inf        177

Second

 # A Seurat-tibble abstraction: 870 × 36
# Features=14311 | Cells=870 | Active assay=SCT | Assays=HTO, SCT
   .cell  orig.ident nCount_RNA nFeature_RNA percent.mt percent.mt.log2 nCount_HTO
   <chr>  <fct>           <dbl>        <int>      <dbl>           <dbl>      <dbl>
 1 AAACC… SeuratPro…      18507         4378          0            -Inf        784
 2 AAACC… SeuratPro…       1311          697          0            -Inf        394

And when I try a merge:

 human_EPI<-merge(BE_EPI_seu_new_singlets, y = BE_EPI_old, add.cell.ids = c("new", "old"))
Error in UseMethod(generic = "GetAssayData", object = object) : 
  no applicable method for 'GetAssayData' applied to an object of class "Assay5"

sessionInfo():

  R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Sonoma 14.1.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Sao_Paulo
tzcode source: internal

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] tidyseurat_0.8.0                tidySingleCellExperiment_1.14.0
 [3] tidySummarizedExperiment_1.14.0 tidyr_1.3.1                    
 [5] tidyomics_1.0.0                 Seurat_4.4.0                   
 [7] nullranges_1.10.0               plyranges_1.24.0               
 [9] tidybulk_1.16.0                 SeuratObject_4.1.4             
[11] sp_2.1-4                        SingleCellExperiment_1.26.0    
[13] ttservice_0.4.0                 SummarizedExperiment_1.34.0    
[15] Biobase_2.64.0                  GenomicRanges_1.56.0           
[17] GenomeInfoDb_1.40.0             IRanges_2.38.0                 
[19] S4Vectors_0.42.0                BiocGenerics_0.50.0            
[21] MatrixGenerics_1.16.0           matrixStats_1.3.0              
[23] ggplot2_3.5.1                   dplyr_1.1.4      

During the tidyomics class I ministered I had a more controlled environment and this error did not appear when I was merging with these environment sessionInfo and processing all the data at the same time:

 R version 4.4.0 (2024-04-24)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=C.UTF-8      
 [2] LC_NUMERIC=C          
 [3] LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8    
 [5] LC_MONETARY=C.UTF-8   
 [6] LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8      
 [8] LC_NAME=C             
 [9] LC_ADDRESS=C          
[10] LC_TELEPHONE=C        
[11] LC_MEASUREMENT=C.UTF-8
[12] LC_IDENTIFICATION=C   

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats4    stats     graphics  grDevices
[5] utils     datasets  methods   base     

other attached packages:
 [1] shiny_1.8.1.1                  
 [2] Seurat_4.4.0                   
 [3] readr_2.1.5                    
 [4] iSEE_2.16.0                    
 [5] nullranges_1.10.0              
 [6] plyranges_1.24.0               
 [7] tidybulk_1.16.0                
 [8] tidySingleCellExperiment_1.14.0
 [9] tidySummarizedExperiment_1.14.0
[10] tidyomics_1.0.0                
[11] tidyr_1.3.1                    
[12] tidyseurat_0.8.0               
[13] SeuratObject_4.1.4             
[14] sp_2.1-4                       
[15] ttservice_0.4.0                
[16] colorspace_2.1-0               
[17] dplyr_1.1.4                    
[18] plotly_4.10.4                  
[19] ggplot2_3.5.1                  
[20] SingleCellExperiment_1.26.0    
[21] SummarizedExperiment_1.34.0    
[22] Biobase_2.64.0                 
[23] GenomicRanges_1.56.0           
[24] GenomeInfoDb_1.40.0            
[25] IRanges_2.38.0                 
[26] S4Vectors_0.42.0               
[27] BiocGenerics_0.50.0            
[28] MatrixGenerics_1.16.0          
[29] matrixStats_1.3.0 

Let me know if I can provide more info!