satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
203 stars 33 forks source link

Error in umi@x : no applicable method for `@` applied to an object of class "matrix" with v2 regularization #170

Closed jjia1 closed 10 months ago

jjia1 commented 11 months ago

I downloaded an .h5 dataset of the Fetal Embryonic Brain from the HCA for reference mapping of my own dataset. I subsampled (memory requirements were too high for full data) 1000 cells out of the ~1.7e6 cells and converted the data into an .h5ad object in Python, which I then further converted into an .h5Seurat object using Convert() in R. When specifying the parameters for SCTransform, I get an error when I set vst.flavor = "v2" but when I use the default parameters the code runs without any issues.

This was the original code I ran.

  data <- SCTransform(data, method = "glmGamPoi", vars.to.regress = "MitoFraction",
                                  vst.flavor = "v2", 
                                  ncells = 1e4, variable.features.n = 5000)

I had the same error when running the same command as below.

 data <- SCTransform(data, method = "glmGamPoi", vst.flavor = "v2")

The output and error from both commands is the following:

 data <- SCTransform(data, method = "glmGamPoi", vst.flavor = "v2")
  vst.flavor='v2' set, setting model to use fixed slope and exclude poisson genes.
  Calculating cell attributes from input UMI matrix: log_umi
  Total Step 1 genes: 11521
  Total overdispersed genes: 7383
  Excluding 4138 genes from Step 1 because they are not overdispersed.
  Variance stabilizing transformation of count matrix of size 11521 by 1665
  Model formula is y ~ log_umi
  Get Negative Binomial regression parameters per gene
  Using 2000 genes, 1665 cells
    |======================================================================| 100%
  Setting estimate of  8 genes to inf as theta_mm/theta_mle < 1e-3
  # of step1 poisson genes (variance < mean): 0
  # of low mean genes (mean < 0.001): 0
  Total # of Step1 poisson genes (theta=Inf; variance < mean): 8
  Total # of poisson genes (theta=Inf; variance < mean): 4138
  Calling offset model for all 4138 poisson genes
  Found 11 outliers - those will be ignored in fitting/regularization step

  Ignoring theta inf genes
  Replacing fit params for 4138 poisson genes by theta=Inf
  Error in umi@x : 
    no applicable method for `@` applied to an object of class "matrix"

Running either the default parameters or the same command as the first just without v2 works.

  data <- SCTransform(data, method = "glmGamPoi", vars.to.regress = "MitoFraction",
                                  ncells = 1e4, variable.features.n = 5000)

  Calculating cell attributes from input UMI matrix: log_umi
  Variance stabilizing transformation of count matrix of size 11521 by 1665
  Model formula is y ~ log_umi
  Get Negative Binomial regression parameters per gene
  Using 2000 genes, 1665 cells
    |======================================================================| 100%
  Found 102 outliers - those will be ignored in fitting/regularization step

  Second step: Get residuals using fitted parameters for 11521 genes
    |======================================================================| 100%
  Computing corrected count matrix for 11521 genes
    |======================================================================| 100%
  Calculating gene attributes
  Wall clock passed: Time difference of 25.07768 secs
  Determine variable features
  Place corrected count matrix in counts slot
  Centering data matrix
    |======================================================================| 100%
  Set default assay to SCT

After my testing, it seems that the issue is the v2 regularization. I'm not sure why I'm encountering this error. I can't tell if it's the dataset that is the problem or if I'm missing something. Any help would be appreciated! Please let me know if you need any more information.

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.1 LTS

Matrix products: default
BLAS:   /opt/R/4.3.1/lib/R/lib/libRblas.so 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] lubridate_1.9.2        forcats_1.0.0          stringr_1.5.0         
 [4] dplyr_1.1.2            purrr_1.0.1            readr_2.1.4           
 [7] tidyr_1.3.0            tibble_3.2.1           ggplot2_3.4.2         
[10] tidyverse_2.0.0        future_1.33.0          sctransform_0.3.5.9006
[13] glmGamPoi_1.12.2       SeuratDisk_0.0.0.9020  SeuratObject_4.1.3    
[16] Seurat_4.3.0.1        

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3          jsonlite_1.8.7             
  [3] magrittr_2.0.3              spatstat.utils_3.0-3       
  [5] zlibbioc_1.46.0             vctrs_0.6.3                
  [7] ROCR_1.0-11                 spatstat.explore_3.2-1     
  [9] RCurl_1.98-1.12             S4Arrays_1.0.5             
 [11] htmltools_0.5.5             parallelly_1.36.0          
 [13] KernSmooth_2.23-21          htmlwidgets_1.6.2          
 [15] ica_1.0-3                   plyr_1.8.8                 
 [17] plotly_4.10.2               zoo_1.8-12                 
 [19] igraph_1.5.0.1              mime_0.12                  
 [21] lifecycle_1.0.3             pkgconfig_2.0.3            
 [23] Matrix_1.6-0                R6_2.5.1                   
 [25] fastmap_1.1.1               GenomeInfoDbData_1.2.10    
 [27] MatrixGenerics_1.12.3       fitdistrplus_1.1-11        
 [29] shiny_1.7.4.1               digest_0.6.33              
 [31] colorspace_2.1-0            patchwork_1.1.2            
 [33] S4Vectors_0.38.1            tensor_1.5                 
 [35] irlba_2.3.5.1               GenomicRanges_1.52.0       
 [37] progressr_0.13.0            timechange_0.2.0           
 [39] fansi_1.0.4                 spatstat.sparse_3.0-2      
 [41] httr_1.4.6                  polyclip_1.10-4            
 [43] abind_1.4-5                 compiler_4.3.1             
 [45] bit64_4.0.5                 withr_2.5.0                
 [47] MASS_7.3-60                 DelayedArray_0.26.7        
 [49] tools_4.3.1                 lmtest_0.9-40              
 [51] httpuv_1.6.11               future.apply_1.11.0        
 [53] goftest_1.2-3               glue_1.6.2                 
 [55] nlme_3.1-162                promises_1.2.0.1           
 [57] grid_4.3.1                  Rtsne_0.16                 
 [59] cluster_2.1.4               reshape2_1.4.4             
 [61] generics_0.1.3              hdf5r_1.3.8                
 [63] gtable_0.3.3                spatstat.data_3.0-1        
 [65] tzdb_0.4.0                  hms_1.1.3                  
 [67] data.table_1.14.8           sp_2.0-0                   
 [69] utf8_1.2.3                  XVector_0.40.0             
 [71] BiocGenerics_0.46.0         spatstat.geom_3.2-4        
 [73] RcppAnnoy_0.0.21            ggrepel_0.9.3              
 [75] RANN_2.6.1                  pillar_1.9.0               
 [77] later_1.3.1                 splines_4.3.1              
 [79] lattice_0.21-8              survival_3.5-5             
 [81] bit_4.0.5                   deldir_1.0-9               
 [83] tidyselect_1.2.0            miniUI_0.1.1.1             
 [85] pbapply_1.7-2               gridExtra_2.3              
 [87] IRanges_2.34.1              SummarizedExperiment_1.30.2
 [89] scattermore_1.2             stats4_4.3.1               
 [91] Biobase_2.60.0              matrixStats_1.0.0          
 [93] stringi_1.7.12              lazyeval_0.2.2             
 [95] codetools_0.2-19            cli_3.6.1                  
 [97] uwot_0.1.16                 xtable_1.8-4               
 [99] reticulate_1.30             munsell_0.5.0              
[101] Rcpp_1.0.11                 GenomeInfoDb_1.36.1        
[103] globals_0.16.2              spatstat.random_3.1-5      
[105] png_0.1-8                   parallel_4.3.1             
[107] ellipsis_0.3.2              bitops_1.0-7               
[109] listenv_0.9.0               viridisLite_0.4.2          
[111] scales_1.2.1                fortunes_1.5-4             
[113] ggridges_0.5.4              leiden_0.4.3               
[115] crayon_1.5.2                rlang_1.1.1                
[117] cowplot_1.1.1    
saketkc commented 11 months ago

Can you share a public link to your object (subset)?

jjia1 commented 11 months ago

Can you share a public link to your object (subset)?

Sure thing! Sorry about that. I should have included a public link to the object earlier. The folder should have both the .h5ad and .h5Seurat formats. Lmk if you have any problems accessing the folder.

jjia1 commented 11 months ago

@saketkc

I think it's related to either dataset size, sparsity, or both. I changed my subsample size to 10x larger and while v2 regularization doesn't work, sctransform is able to run without providing an error.

Ignoring theta inf genes
Replacing fit params for 6125 poisson genes by theta=Inf
vst.flavor = 'v2' failed. Running without it.
saketkc commented 11 months ago

Thanks for sharing the h5Seurat object. I am unable to reproduce the error at my end though. Is it the object that was causing error with v2?

obj <- LoadH5Seurat("fetalembryonicdata_sampled_1690922510.h5Seurat")
obj22 <- SCTransform(obj, vst.flavor="v2")
vst.flavor='v2' set, setting model to use fixed slope and exclude poisson genes.
Calculating cell attributes from input UMI matrix: log_umi
Total Step 1 genes: 26736
Total overdispersed genes: 24679
Excluding 2057 genes from Step 1 because they are not overdispersed.
Variance stabilizing transformation of count matrix of size 30229 by 18833
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 5000 cells
  |========================================================| 100%
Setting estimate of  189 genes to inf as theta_mm/theta_mle < 1e-3
# of step1 poisson genes (variance < mean): 0
# of low mean genes (mean < 0.001): 3637
Total # of Step1 poisson genes (theta=Inf; variance < mean): 227
Total # of poisson genes (theta=Inf; variance < mean): 5437
Calling offset model for all 5437 poisson genes
Found 209 outliers - those will be ignored in fitting/regularization step

Ignoring theta inf genes
Replacing fit params for 5437 poisson genes by theta=Inf
Setting min_variance based on median UMI:  0.04
Second step: Get residuals using fitted parameters for 30229 genes
  |========================================================| 100%
Computing corrected count matrix for 30229 genes
  |========================================================| 100%
Calculating gene attributes
Wall clock passed: Time difference of 2.240513 mins
Determine variable features
Place corrected count matrix in counts slot
Centering data matrix
  |========================================================| 100%
Set default assay to SCT
jjia1 commented 11 months ago

Thanks for sharing the h5Seurat object. I am unable to reproduce the error at my end though. Is it the object that was causing error with v2?

obj <- LoadH5Seurat("fetalembryonicdata_sampled_1690922510.h5Seurat")
obj22 <- SCTransform(obj, vst.flavor="v2")
vst.flavor='v2' set, setting model to use fixed slope and exclude poisson genes.
Calculating cell attributes from input UMI matrix: log_umi
Total Step 1 genes: 26736
Total overdispersed genes: 24679
Excluding 2057 genes from Step 1 because they are not overdispersed.
Variance stabilizing transformation of count matrix of size 30229 by 18833
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 2000 genes, 5000 cells
  |========================================================| 100%
Setting estimate of  189 genes to inf as theta_mm/theta_mle < 1e-3
# of step1 poisson genes (variance < mean): 0
# of low mean genes (mean < 0.001): 3637
Total # of Step1 poisson genes (theta=Inf; variance < mean): 227
Total # of poisson genes (theta=Inf; variance < mean): 5437
Calling offset model for all 5437 poisson genes
Found 209 outliers - those will be ignored in fitting/regularization step

Ignoring theta inf genes
Replacing fit params for 5437 poisson genes by theta=Inf
Setting min_variance based on median UMI:  0.04
Second step: Get residuals using fitted parameters for 30229 genes
  |========================================================| 100%
Computing corrected count matrix for 30229 genes
  |========================================================| 100%
Calculating gene attributes
Wall clock passed: Time difference of 2.240513 mins
Determine variable features
Place corrected count matrix in counts slot
Centering data matrix
  |========================================================| 100%
Set default assay to SCT

Oh did you try using glmGamPoi? I wonder if that could be the source of the error rather than the dataset since you weren't able to reproduce the error. I have some other scripts running currently, but I will test this out soon. Thanks for testing!

saketkc commented 11 months ago

V2 relies on glmgampoi for estimation internally so that is going through as well. Let me know what step you are facing issues.

jjia1 commented 11 months ago

Could I see your sessionInfo() output? I'm still getting an umi@x: no applicable method... error. It's failing at Replacing fit params for X poisson genes by theta=Inf

Replacing fit params for 6125 poisson genes by theta=Inf
Error in umi@x : 
  no applicable method for `@` applied to an object of class "matrix"
saketkc commented 11 months ago

Here's my session info:

sessionInfo()
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.6 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
 [1] SeuratDisk_0.0.0.9020    diffusionMap_1.2.0      
 [3] pheatmap_1.0.12          destiny_3.12.0          
 [5] princurve_2.1.6          variancePartition_1.28.9
 [7] BiocParallel_1.32.6      limma_3.54.2            
 [9] patchwork_1.1.2          Matrix_1.5-4.1          
[11] nnet_7.3-18              future.apply_1.10.0     
[13] future_1.31.0            lubridate_1.9.2         
[15] forcats_1.0.0            stringr_1.5.0           
[17] dplyr_1.1.2              purrr_1.0.1             
[19] readr_2.1.4              tidyr_1.3.0             
[21] tibble_3.2.1             ggplot2_3.4.2           
[23] tidyverse_2.0.0          SeuratObject_4.1.3      
[25] Seurat_4.3.0            

loaded via a namespace (and not attached):
  [1] rappdirs_0.3.3              rtracklayer_1.58.0         
  [3] scattermore_0.8             ggthemes_4.2.4             
  [5] knitr_1.42                  bit64_4.0.5                
  [7] irlba_2.3.5.1               DelayedArray_0.24.0        
  [9] rpart_4.1.19                data.table_1.14.8          
 [11] KEGGREST_1.38.0             RCurl_1.98-1.12            
 [13] doParallel_1.0.17           generics_0.1.3             
 [15] GenomicFeatures_1.50.4      BiocGenerics_0.44.0        
 [17] callr_3.7.3                 RhpcBLASctl_0.23-42        
 [19] cowplot_1.1.1               RSQLite_2.3.1              
 [21] usethis_2.1.6               RANN_2.6.1                 
 [23] proxy_0.4-27                bit_4.0.5                  
 [25] tzdb_0.4.0                  xml2_1.3.4                 
 [27] spatstat.data_3.0-0         httpuv_1.6.9               
 [29] SummarizedExperiment_1.28.0 xfun_0.39                  
 [31] hms_1.1.3                   evaluate_0.20              
 [33] promises_1.2.0.1            restfulr_0.0.15            
 [35] DEoptimR_1.0-14             fansi_1.0.4                
 [37] progress_1.2.2              dbplyr_2.3.2               
 [39] caTools_1.18.2              DBI_1.1.3                  
 [41] igraph_1.4.1                htmlwidgets_1.6.1          
 [43] spatstat.geom_3.0-6         stats4_4.2.2               
 [45] ellipsis_0.3.2              RSpectra_0.16-1            
 [47] backports_1.4.1             aod_1.3.2                  
 [49] sparseMatrixStats_1.10.0    biomaRt_2.54.1             
 [51] deldir_1.0-6                MatrixGenerics_1.10.0      
 [53] vctrs_0.6.3                 SingleCellExperiment_1.20.1
 [55] Biobase_2.58.0              remotes_2.4.2              
 [57] TTR_0.24.3                  ROCR_1.0-11                
 [59] abind_1.4-5                 cachem_1.0.7               
 [61] RcppEigen_0.3.3.9.3         withr_2.5.0                
 [63] BSgenome_1.66.3             robustbase_0.99-0          
 [65] progressr_0.13.0            checkmate_2.2.0            
 [67] vcd_1.4-11                  sctransform_0.3.5          
 [69] GenomicAlignments_1.34.1    xts_0.13.1                 
 [71] prettyunits_1.1.1           goftest_1.2-3              
 [73] cluster_2.1.4               lazyeval_0.2.2             
 [75] laeken_0.5.2                crayon_1.5.2               
 [77] hdf5r_1.3.8                 spatstat.explore_3.0-6     
 [79] pkgconfig_2.0.3             GenomeInfoDb_1.34.9        
 [81] nlme_3.1-162                pkgload_1.3.2              
 [83] devtools_2.4.5              rlang_1.1.1                
 [85] globals_0.16.2              lifecycle_1.0.3            
 [87] miniUI_0.1.1.1              filelock_1.0.2             
 [89] wiggleplotr_1.22.0          BiocFileCache_2.6.1        
 [91] polyclip_1.10-4             RcppHNSW_0.4.1             
 [93] matrixStats_0.63.0          lmtest_0.9-40              
 [95] carData_3.0-5               boot_1.3-28.1              
 [97] zoo_1.8-11                  base64enc_0.1-3            
 [99] ggridges_0.5.4              processx_3.8.1             
[101] rjson_0.2.21                png_0.1-8                  
[103] viridisLite_0.4.1           bitops_1.0-7               
[105] KernSmooth_2.23-22          Biostrings_2.66.0          
[107] blob_1.2.4                  parallelly_1.34.0          
[109] spatstat.random_3.1-3       remaCor_0.0.16             
[111] S4Vectors_0.36.2            scales_1.2.1               
[113] memoise_2.0.1               magrittr_2.0.3             
[115] plyr_1.8.8                  hexbin_1.28.3              
[117] ica_1.0-3                   gplots_3.1.3               
[119] zlibbioc_1.44.0             compiler_4.2.2             
[121] BiocIO_1.8.0                pertinent_0.0.0.9006       
[123] RColorBrewer_1.1-3          pcaMethods_1.90.0          
[125] lme4_1.1-33                 fitdistrplus_1.1-8         
[127] Rsamtools_2.14.0            cli_3.6.1                  
[129] XVector_0.38.0              urlchecker_1.0.1           
[131] listenv_0.9.0               pbapply_1.7-0              
[133] ps_1.7.5                    htmlTable_2.4.1            
[135] Formula_1.2-5               ggplot.multistats_1.0.0    
[137] MASS_7.3-60                 tidyselect_1.2.0           
[139] stringi_1.7.12              glmGamPoi_1.11.7           
[141] yaml_2.3.7                  ggrepel_0.9.3              
[143] grid_4.2.2                  tools_4.2.2                
[145] timechange_0.2.0            parallel_4.2.2             
[147] rstudioapi_0.14             foreign_0.8-84             
[149] foreach_1.5.2               gridExtra_2.3              
[151] EnvStats_2.7.0              smoother_1.1               
[153] scatterplot3d_0.3-44        Rtsne_0.16                 
[155] digest_0.6.31               shiny_1.7.4                
[157] Rcpp_1.0.10                 GenomicRanges_1.50.2       
[159] car_3.1-2                   broom_1.0.5                
[161] later_1.3.0                 RcppAnnoy_0.0.20           
[163] AnnotationDbi_1.60.2        httr_1.4.5                 
[165] Rdpack_2.4                  colorspace_2.1-0           
[167] XML_3.99-0.14               fs_1.6.2                   
[169] tensor_1.5                  ranger_0.15.1              
[171] reticulate_1.28             IRanges_2.32.0             
[173] splines_4.2.2               uwot_0.1.14                
[175] spatstat.utils_3.0-1        sp_1.6-0                   
[177] plotly_4.10.1               sessioninfo_1.2.2          
[179] xtable_1.8-4                jsonlite_1.8.4             
[181] nloptr_2.0.3                R6_2.5.1                   
[183] Hmisc_5.1-0                 profvis_0.3.7              
[185] pillar_1.9.0                htmltools_0.5.4            
[187] mime_0.12                   glue_1.6.2                 
[189] fastmap_1.1.1               minqa_1.2.5                
[191] VIM_6.2.2                   class_7.3-21               
[193] codetools_0.2-19            pkgbuild_1.4.0             
[195] mvtnorm_1.2-2               utf8_1.2.3                 
[197] lattice_0.21-8              spatstat.sparse_3.0-0      
[199] pbkrtest_0.5.2              curl_5.0.0                 
[201] leiden_0.4.3                gtools_3.9.4               
[203] survival_3.5-5              rmarkdown_2.22             
[205] munsell_0.5.0               e1071_1.7-13               
[207] GenomeInfoDbData_1.2.9      iterators_1.0.14           
[209] reshape2_1.4.4              gtable_0.3.3               
[211] rbibutils_2.2.13      
jjia1 commented 10 months ago

Sorry for the late response, but I compared with my sessionInfo earlier in the thread. It looks like we have a few differences but I'm not sure which one cause the problem. For now, I've worked around the issue. Thank you for the taking the time to respond!