satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
203 stars 33 forks source link

problem with regressing out rbc genes, not all variables found in seurat object meta data? #92

Closed jessicook closed 3 years ago

jessicook commented 3 years ago

Thanks for this great package, I've enjoyed using it. I'm attempting to regress out the effect of RBC genes, and I'm running into the error below. I've added my session info at the end.

> rbc.genes <- list("Hba-a1", "Hbb-bs", "Hbb-bt", "Hba-a2")
> om.all <- AddModuleScore(
+     object = om.all,
+     features = rbc.genes,
+     ctrl = 5,
+     name = 'RBC_regress'
+ )
> om.all <- SCTransform(om.all, vars.to.regress = c("RBC_regress"), verbose = FALSE)
Error in SCTransform(om.all, vars.to.regress = c("RBC_regress"), verbose = FALSE) : 
  problem with second non-regularized linear regression; not all variables found in seurat object meta data; check vars.to.regress parameter
> traceback()
2: stop("problem with second non-regularized linear regression; not all variables found in seurat object meta data; check vars.to.regress parameter")
1: SCTransform(om.all, vars.to.regress = c("RBC_regress"), verbose = FALSE)
> sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Mojave 10.14.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] scuttle_1.0.4               celldex_1.0.0               scater_1.18.3              
 [4] sctransform_0.3.2           scRNAseq_2.4.0              scran_1.18.3               
 [7] SingleR_1.4.0               Seurat_3.2.3                forcats_0.5.0              
[10] stringr_1.4.0               dplyr_1.0.3                 purrr_0.3.4                
[13] readr_1.4.0                 tidyr_1.1.2                 tibble_3.0.5               
[16] ggplot2_3.3.3               tidyverse_1.3.0             SingleCellExperiment_1.12.0
[19] SummarizedExperiment_1.20.0 Biobase_2.50.0              GenomicRanges_1.42.0       
[22] GenomeInfoDb_1.26.2         IRanges_2.24.1              MatrixGenerics_1.2.0       
[25] matrixStats_0.57.0          S4Vectors_0.28.1            BiocGenerics_0.36.0        

loaded via a namespace (and not attached):
  [1] reticulate_1.18               tidyselect_1.1.0              RSQLite_2.2.2                
  [4] AnnotationDbi_1.52.0          htmlwidgets_1.5.3             grid_4.0.3                   
  [7] BiocParallel_1.24.1           Rtsne_0.15                    munsell_0.5.0                
 [10] codetools_0.2-18              ica_1.0-2                     statmod_1.4.35               
 [13] future_1.21.0                 miniUI_0.1.1.1                withr_2.3.0                  
 [16] colorspace_2.0-0              knitr_1.30                    rstudioapi_0.13              
 [19] ROCR_1.0-11                   tensor_1.5                    listenv_0.8.0                
 [22] labeling_0.4.2                GenomeInfoDbData_1.2.4        polyclip_1.10-0              
 [25] pheatmap_1.0.12               bit64_4.0.5                   farver_2.0.3                 
 [28] parallelly_1.23.0             vctrs_0.3.6                   generics_0.1.0               
 [31] xfun_0.20                     BiocFileCache_1.14.0          R6_2.5.0                     
 [34] ggbeeswarm_0.6.0              rsvd_1.0.3                    locfit_1.5-9.4               
 [37] AnnotationFilter_1.14.0       bitops_1.0-6                  spatstat.utils_1.20-2        
 [40] DelayedArray_0.16.0           assertthat_0.2.1              promises_1.1.1               
 [43] scales_1.1.1                  beeswarm_0.2.3                gtable_0.3.0                 
 [46] beachmat_2.6.4                globals_0.14.0                goftest_1.2-2                
 [49] ensembldb_2.14.0              rlang_0.4.10                  splines_4.0.3                
 [52] rtracklayer_1.50.0            lazyeval_0.2.2                broom_0.7.3                  
 [55] yaml_2.2.1                    BiocManager_1.30.10           reshape2_1.4.4               
 [58] abind_1.4-5                   modelr_0.1.8                  GenomicFeatures_1.42.1       
 [61] backports_1.2.1               httpuv_1.5.5                  tools_4.0.3                  
 [64] ellipsis_0.3.1                RColorBrewer_1.1-2            ggridges_0.5.3               
 [67] Rcpp_1.0.6                    plyr_1.8.6                    progress_1.2.2               
 [70] sparseMatrixStats_1.2.0       zlibbioc_1.36.0               RCurl_1.98-1.2               
 [73] prettyunits_1.1.1             openssl_1.4.3                 rpart_4.1-15                 
 [76] deldir_0.2-3                  viridis_0.5.1                 pbapply_1.4-3                
 [79] cowplot_1.1.1                 zoo_1.8-8                     haven_2.3.1                  
 [82] ggrepel_0.9.1                 cluster_2.1.0                 fs_1.5.0                     
 [85] magrittr_2.0.1                RSpectra_0.16-0               data.table_1.13.6            
 [88] scattermore_0.7               lmtest_0.9-38                 reprex_0.3.0                 
 [91] RANN_2.6.1                    ProtGenerics_1.22.0           fitdistrplus_1.1-3           
 [94] evaluate_0.14                 hms_1.0.0                     patchwork_1.1.1              
 [97] mime_0.9                      xtable_1.8-4                  XML_3.99-0.5                 
[100] readxl_1.3.1                  gridExtra_2.3                 biomaRt_2.46.0               
[103] compiler_4.0.3                KernSmooth_2.23-18            crayon_1.3.4                 
[106] htmltools_0.5.1               mgcv_1.8-33                   later_1.1.0.1                
[109] lubridate_1.7.9.2             DBI_1.1.1                     ExperimentHub_1.16.0         
[112] dbplyr_2.0.0                  MASS_7.3-53                   rappdirs_0.3.1               
[115] Matrix_1.3-2                  cli_2.2.0                     igraph_1.2.6                 
[118] pkgconfig_2.0.3               GenomicAlignments_1.26.0      plotly_4.9.3                 
[121] xml2_1.3.2                    vipor_0.4.5                   dqrng_0.2.1                  
[124] XVector_0.30.0                rvest_0.3.6                   digest_0.6.27                
[127] RcppAnnoy_0.0.18              Biostrings_2.58.0             spatstat.data_1.7-0          
[130] rmarkdown_2.6                 cellranger_1.1.0              leiden_0.3.6                 
[133] uwot_0.1.10                   edgeR_3.32.1                  DelayedMatrixStats_1.12.2    
[136] curl_4.3                      Rsamtools_2.6.0               shiny_1.5.0                  
[139] lifecycle_0.2.0               nlme_3.1-151                  jsonlite_1.7.2               
[142] BiocNeighbors_1.8.2           askpass_1.1                   viridisLite_0.3.0            
[145] limma_3.46.0                  fansi_0.4.2                   pillar_1.4.7                 
[148] lattice_0.20-41               fastmap_1.0.1                 httr_1.4.2                   
[151] survival_3.2-7                interactiveDisplayBase_1.28.0 glue_1.4.2                   
[154] spatstat_1.64-1               png_0.1-7                     bluster_1.0.0                
[157] BiocVersion_3.12.0            bit_4.0.4                     stringi_1.5.3                
[160] blob_1.2.1                    BiocSingular_1.6.0            AnnotationHub_2.22.0         
[163] memoise_1.1.0                 irlba_2.3.3                   future.apply_1.7.0 

In addition, if I want to regress out multiple facotrs (percent.mt, cell cycle etc), should I do this all at once, or is running SCtransform sequentially okay?

jessicook commented 3 years ago

might have figured this out, looked at the meta.data and the regressions were stored per gene as "RBC_regress1" and so on, when I changed vars.to.regress = c("RBC_regress1", RBC_regress2"..."4") , it seems to be working. Was this the correct solution?

ChristophH commented 3 years ago

First a disclaimer, AddModuleScore and SCTransform are functions from the Seurat package, not sctransform.

There is one problem with your solution. The list you are passing to AddModuleScore as features is of length 4, when I think you want a list with one entry only.

rbc.genes <- list(c("Hba-a1", "Hbb-bs", "Hbb-bt", "Hba-a2"))

You'd then only get one RBC_regress variable (combining the signal from the four genes) that you could use like so:

vars.to.regress = "RBC_regress1"
jessicook commented 3 years ago

this solved it, thanks!