Closed ChristopherStephens21 closed 10 months ago
I am running the analysis on a Mac (MacOS Ventura 13.3.1). Code for cell cycle regression and mitochondrial gene percentage was changed from the pbmc code to reflect differences in gene name formatting for my mouse datasets, but no other changes were made between running it on the pbmc data and my own.
Thanks for catching this @ChristopherStephens21, I will push a fix.
I have pushed a fix here https://github.com/satijalab/seurat/commit/64a6495ec17759d2a7513c706871b9167dece01c
You can install the develop
branch to test:
remotes::install_github("sataijalab/seurat", ref="develop")
Hope this helps!
I was having similar observations to the original post. However, after installing the developer version it appears that it no longer is able to grab the values from the meta.data as I get the following message.
pbmc <- SCTransform(pbmc, vars.to.regress = c("percent.mt"), vst.flavor = "v2", method = "glmGamPoi", verbose = T) Running SCTransform on assay: RNA Running SCTransform on layer: counts vst.flavor='v2' set. Using model with fixed slope and excluding poisson genes. Variance stabilizing transformation of count matrix of size 19037 by 9723 Model formula is y ~ log_umi Get Negative Binomial regression parameters per gene Using 2000 genes, 5000 cells Found 233 outliers - those will be ignored in fitting/regularization step
Second step: Get residuals using fitted parameters for 19037 genes Computing corrected count matrix for 19037 genes Calculating gene attributes Wall clock passed: Time difference of 49.41705 secs Determine variable features Regressing out percent.mt |============================================================================================================| 100% Centering data matrix |============================================================================================================| 100% Getting residuals for block 1(of 2) for counts dataset Getting residuals for block 2(of 2) for counts dataset Error: None of the requested variables to regress are present in the object.
Hi, I'm having the same issue as @Greysun109 Regression of mitochondrial gene content by SCTransform using vars.to.regress="percent.mito"
leads to the same UMAP as without regression.
Unfortunately, installing develop
branch as suggested by @saketkc did not resolve the issue.
I'm also getting the same error message as @Greysun109 (Error: None of the requested variables to regress are present in the object.) although the variable is included in the metadata slot of the seurat object.
I'm on macOS Ventura 13.6.1
Hello, I am having the same issue as @Bildungsluecke . I have installed your version using
remotes::install_github("sataijalab/seurat", ref="develop")
but I have this error at the end of SCTransform() calculations:
Getting residuals for block 1(of 2) for counts dataset
Getting residuals for block 2(of 2) for counts dataset
Error: None of the requested variables to regress are present in the object.
Before your version, I had the same problem as @ChristopherStephens21 , same UMAP from regressed out samples running SCTransform()
Here is my sessionInfo() if it can be useful:
> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.7.3
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] C
time zone: Europe/Paris
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] DoubletFinder_2.0.3 dplyr_1.1.4 MetaDE_2.2.3
[4] glmGamPoi_1.14.0 presto_1.0.0 data.table_1.14.10
[7] Rcpp_1.0.12 scCustomize_2.0.1 ggplot2_3.4.4
[10] Seurat_5.0.1.9003 SeuratObject_5.0.1 sp_2.1-2
loaded via a namespace (and not attached):
[1] RcppAnnoy_0.0.22 splines_4.3.1
[3] later_1.3.2 bitops_1.0-7
[5] tibble_3.2.1 polyclip_1.10-6
[7] janitor_2.2.0 fastDummies_1.7.3
[9] lifecycle_1.0.4 edgeR_4.0.12
[11] hdf5r_1.3.9 globals_0.16.2
[13] lattice_0.22-5 MASS_7.3-60.0.1
[15] magrittr_2.0.3 openxlsx_4.2.5.2
[17] limma_3.58.1 plotly_4.10.4
[19] httpuv_1.6.14 sctransform_0.4.1
[21] spam_2.10-0 zip_2.3.1
[23] spatstat.sparse_3.0-3 reticulate_1.34.0
[25] cowplot_1.1.3 pbapply_1.7-2
[27] RColorBrewer_1.1-3 lubridate_1.9.3
[29] abind_1.4-5 zlibbioc_1.48.0
[31] Rtsne_0.17 GenomicRanges_1.54.1
[33] purrr_1.0.2 BiocGenerics_0.48.1
[35] RCurl_1.98-1.14 circlize_0.4.16
[37] GenomeInfoDbData_1.2.11 IRanges_2.36.0
[39] S4Vectors_0.40.2 ggrepel_0.9.5
[41] irlba_2.3.5.1 listenv_0.9.1
[43] spatstat.utils_3.0-4 goftest_1.2-3
[45] RSpectra_0.16-1 spatstat.random_3.2-2
[47] fitdistrplus_1.1-11 parallelly_1.36.0
[49] DelayedMatrixStats_1.24.0 leiden_0.4.3.1
[51] codetools_0.2-19 DelayedArray_0.28.0
[53] tidyselect_1.2.0 shape_1.4.6
[55] farver_2.1.1 matrixStats_1.2.0
[57] stats4_4.3.1 spatstat.explore_3.2-5
[59] jsonlite_1.8.8 ellipsis_0.3.2
[61] progressr_0.14.0 ggridges_0.5.6
[63] survival_3.5-7 tools_4.3.1
[65] ica_1.0-3 glue_1.7.0
[67] gridExtra_2.3 SparseArray_1.2.3
[69] DESeq2_1.42.0 MatrixGenerics_1.14.0
[71] GenomeInfoDb_1.38.5 withr_3.0.0
[73] combinat_0.0-8 fastmap_1.1.1
[75] fansi_1.0.6 digest_0.6.34
[77] timechange_0.3.0 R6_2.5.1
[79] mime_0.12 ggprism_1.0.4
[81] samr_3.0 colorspace_2.1-0
[83] scattermore_1.2 tensor_1.5
[85] spatstat.data_3.0-4 utf8_1.2.4
[87] tidyr_1.3.1 generics_0.1.3
[89] httr_1.4.7 htmlwidgets_1.6.4
[91] S4Arrays_1.2.0 GSA_1.03.2
[93] uwot_0.1.16 pkgconfig_2.0.3
[95] gtable_0.3.4 lmtest_0.9-40
[97] impute_1.76.0 XVector_0.42.0
[99] htmltools_0.5.7 dotCall64_1.1-1
[101] scales_1.3.0 Biobase_2.62.0
[103] png_0.1-8 snakecase_0.11.1
[105] rstudioapi_0.15.0 reshape2_1.4.4
[107] nlme_3.1-164 zoo_1.8-12
[109] GlobalOptions_0.1.2 stringr_1.5.1
[111] KernSmooth_2.23-22 parallel_4.3.1
[113] miniUI_0.1.1.1 vipor_0.4.7
[115] ggrastr_1.0.2 pillar_1.9.0
[117] grid_4.3.1 vctrs_0.6.5
[119] RANN_2.6.1 promises_1.2.1
[121] shinyFiles_0.9.3 xtable_1.8-4
[123] cluster_2.1.6 beeswarm_0.4.0
[125] paletteer_1.6.0 locfit_1.5-9.8
[127] cli_3.6.2 compiler_4.3.1
[129] rlang_1.1.3 crayon_1.5.2
[131] future.apply_1.11.1 labeling_0.4.3
[133] rematch2_2.1.2 plyr_1.8.9
[135] forcats_1.0.0 fs_1.6.3
[137] ggbeeswarm_0.7.2 stringi_1.8.3
[139] BiocParallel_1.36.0 viridisLite_0.4.2
[141] deldir_2.0-2 munsell_0.5.0
[143] lazyeval_0.2.2 spatstat.geom_3.2-8
[145] Matrix_1.6-5 RcppHNSW_0.5.0
[147] patchwork_1.2.0 sparseMatrixStats_1.14.0
[149] bit64_4.0.5 future_1.33.1
[151] statmod_1.5.0 shiny_1.8.0
[153] SummarizedExperiment_1.32.0 ROCR_1.0-11
[155] igraph_1.6.0 bit_4.0.5
It is because in the function SCTransform(), the default option ncell = 5000 split my dataset in two at some point because with verbose = TRUE, I can see:
Getting residuals for block 1(of 2) for counts dataset
Getting residuals for block 2(of 2) for counts dataset
Indeed, my dataset is composed of 5640 cells, thus using the default option ncell = 5000 might produce some weird calculations maybe? Therefore I have tried to use ncell = 5640, in addition to vars.to.regress = "mtRNA.percent", and it works fine (I have a different UMAP indicating that regressing out mitochondrial RNA worked). Plus, I don't have the two following lines:
Getting residuals for block 1(of 2) for counts dataset
Getting residuals for block 2(of 2) for counts dataset
indicating that the calculations are at some point splitted in two I guess.
Remark 1: I have updated back my Seurat package to use the current version of the function (not the one you sent us with remotes::install_github("sataijalab/seurat", ref="develop") ), and the problem can be solved using ncell = your_number_of_cells_in_your_dataset
Remark 2: It seems that without any regression, ncell can be set to less than 5000 (I tried 5640/2), and the output is OK. Therefore, there is a problem with ncell only when using vars.to.regress. I hope it will help troubleshooting the function.
In any case, @saketkc , I think you should update the SCTransform function about this ncell option; either by setting it by default to the number of cells in the sample, or by updating the calculations that I know nothing about. Plus, you should talk about ncell in your tutorial vignette, because I didn't see it anywhere. Otherwise, many people will have the same issue...
Thanks again for your amazing work, Best regards,
To add to this – I was seeing the same original issue of vars.to.regress not changing anything with Seurat_5.0.0. Can confirm that (for whatever reason) adding ncells = ncol(seurat_object)
seemed to do the trick, as now my UMAP coordinates & clusters are different when I use different variables for vars.to.regress
(thanks, @CroixJeremy2!).
Never installed from develop
so I can't speak into the Error: None of the requested variables to regress are present in the object
error. I've been using Seurat_5.0.0 this whole time.
Example 1: With Regression
pbmc_data <- Seurat::Read10X(data.dir = FilePath) pbmc <- CreateSeuratObject(counts = pbmc_data) s.genes <- cc.genes$s.genes g2m.genes <- cc.genes$g2m.genes pbmc <- NormalizeData(pbmc) pbmc <- FindVariableFeatures(pbmc, selection.method = "vst") s.genes <- cc.genes.updated.2019$s.genes #Cell cycle markers loaded from Seurat s.genes <- sapply(s.genes, str_to_title) g2m.genes <- cc.genes.updated.2019$g2m.genes #Separating into S and G2M markers g2m.genes <- sapply(g2m.genes, str_to_title) pbmc <- CellCycleScoring(pbmc, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE) pbmc <- PercentageFeatureSet(pbmc, pattern = "^mt-", col.name = "percent.mt") pbmc <- SCTransform(pbmc, vars.to.regress = c("percent.mt", "S.Score", "G2M.Score"), vst.flavor = "v2", method = "glmGamPoi", verbose = T) pbmc <- RunPCA(pbmc, verbose = T) pbmc <- RunUMAP(pbmc, dims = 1:30, verbose = T) pbmc <- FindNeighbors(pbmc, dims = 1:30, verbose = T) pbmc <- FindClusters(pbmc, verbose = T)
Example 2: Without Regression
pbmc2 <- CreateSeuratObject(counts = pbmc_data) pbmc2 <- NormalizeData(pbmc2) pbmc2 <- FindVariableFeatures(pbmc2, selection.method = "vst") pbmc2 <- CellCycleScoring(pbmc2, s.features = s.genes, g2m.features = g2m.genes, set.ident = TRUE) pbmc2 <- PercentageFeatureSet(pbmc2, pattern = "^mt-", col.name = "percent.mt") pbmc2 <- SCTransform(pbmc2, vst.flavor = "v2", method = "glmGamPoi", verbose = T) pbmc2 <- RunPCA(pbmc2, verbose = T) pbmc2 <- RunUMAP(pbmc2, dims = 1:30, verbose = T) pbmc2 <- FindNeighbors(pbmc2, dims = 1:30, verbose = T) pbmc2 <- FindClusters(pbmc2, verbose = T) Plot1 <- DimPlot(pbmc, label = TRUE) + ggtitle("With MT and CC Regression") Plot2 <- DimPlot(pbmc2, label = TRUE) + ggtitle("Without MT and CC Regression") Plot1 + Plot2 identical(pbmc@meta.data, pbmc2@meta.data)
Second step: Get residuals using fitted parameters for 22452 genes Computing corrected count matrix for 22452 genes Calculating gene attributes Wall clock passed: Time difference of 23.15094 secs Determine variable features Regressing out percent.mt, S.Score, G2M.Score |=================================================================================================================================================================================================================================================================================| 100% Centering data matrix |=================================================================================================================================================================================================================================================================================| 100% Getting residuals for block 1(of 2) for counts dataset Getting residuals for block 2(of 2) for counts dataset Centering data matrix |=================================================================================================================================================================================================================================================================================| 100% Finished calculating residuals for counts Set default assay to SCT
Number of nodes: 8924 Number of edges: 274391
Running Louvain algorithm... 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **| Maximum modularity in 10 random starts: 0.9218 Number of communities: 25 Elapsed time: 0 seconds
Second step: Get residuals using fitted parameters for 22452 genes Computing corrected count matrix for 22452 genes Calculating gene attributes Wall clock passed: Time difference of 22.24379 secs Determine variable features Centering data matrix |=================================================================================================================================================================================================================================================================================| 100% Getting residuals for block 1(of 2) for counts dataset Getting residuals for block 2(of 2) for counts dataset Centering data matrix |=================================================================================================================================================================================================================================================================================| 100% Finished calculating residuals for counts Set default assay to SCT
Number of nodes: 8924 Number of edges: 274391
Running Louvain algorithm... 0% 10 20 30 40 50 60 70 80 90 100% [----|----|----|----|----|----|----|----|----|----| **| Maximum modularity in 10 random starts: 0.9218 Number of communities: 25 Elapsed time: 0 seconds