satijalab / sctransform

R package for modeling single cell UMI expression data using regularized negative binomial regression
GNU General Public License v3.0
208 stars 33 forks source link

Why SCTransform converts the gene expression levels to integers or some categorized values? #93

Closed ElyasMo closed 3 years ago

ElyasMo commented 3 years ago

I am trying to analyze Alzheimer and healthy (in case of Alzheimer) human brain slices. After loading the data and normalizing the data through default options of SCTransform command, when I try to plot the the expression levels of some genes through vln command I see that the expression values are changed to integers or they are categorized in some defined levels and not a continues period of numbers.

I have rechecked this issue with the default mouse brain tutorial and I have got the same problem. Here is vln plot from the mouse brain dataset which is provided by Seurat as a tutorial. So it could be reproduceable for you as well.

brain<-Load10X_Spatial(Directory,filename = "filtered_feature_bc_matrix.h5",
                assay = "Spatial",
                slice = "slice1",
                filter.matrix = TRUE,
                to.upper = FALSE)
brain <- SCTransform(brain, assay = "Spatial", verbose = FALSE, do.scale=TRUE)
VlnPlot(brain, features = 'APP')

Rplot

here is the result of my sessioninfo():

R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] sctransform_0.3.2         dplyr_1.0.2               patchwork_1.1.1           ggplot2_3.3.2             stxBrain.SeuratData_0.1.1 panc8.SeuratData_3.0.2   
[7] SeuratData_0.2.1          Seurat_3.2.3             

loaded via a namespace (and not attached):
  [1] nlme_3.1-148          matrixStats_0.57.0    bit64_4.0.5           RcppAnnoy_0.0.16      RColorBrewer_1.1-2    httr_1.4.2            tools_4.0.2          
  [8] R6_2.5.0              irlba_2.3.3           rpart_4.1-15          KernSmooth_2.23-17    uwot_0.1.10           mgcv_1.8-31           lazyeval_0.2.2       
 [15] colorspace_2.0-0      withr_2.3.0           tidyselect_1.1.0      gridExtra_2.3         bit_4.0.4             compiler_4.0.2        cli_2.2.0            
 [22] hdf5r_1.3.3           plotly_4.9.2.2        labeling_0.4.2        scales_1.1.1          lmtest_0.9-38         spatstat.data_1.7-0   ggridges_0.5.2       
 [29] pbapply_1.4-3         rappdirs_0.3.1        spatstat_1.64-1       goftest_1.2-2         stringr_1.4.0         digest_0.6.27         spatstat.utils_1.17-0
 [36] pkgconfig_2.0.3       htmltools_0.5.1.1     parallelly_1.22.0     fastmap_1.0.1         htmlwidgets_1.5.3     rlang_0.4.9           rstudioapi_0.13      
 [43] shiny_1.5.0           farver_2.0.3          generics_0.1.0        zoo_1.8-8             jsonlite_1.7.2        ica_1.0-2             magrittr_2.0.1       
 [50] Matrix_1.2-18         fansi_0.4.1           Rcpp_1.0.5            munsell_0.5.0         abind_1.4-5           reticulate_1.18       lifecycle_0.2.0      
 [57] stringi_1.5.3         yaml_2.2.1            MASS_7.3-51.6         Rtsne_0.15            plyr_1.8.6            grid_4.0.2            parallel_4.0.2       
 [64] listenv_0.8.0         promises_1.1.1        ggrepel_0.9.0         crayon_1.3.4          miniUI_0.1.1.1        deldir_0.2-3          lattice_0.20-41      
 [71] cowplot_1.1.0         splines_4.0.2         tensor_1.5            pillar_1.4.7          igraph_1.2.6          future.apply_1.6.0    reshape2_1.4.4       
 [78] codetools_0.2-16      leiden_0.3.6          glue_1.4.2            data.table_1.13.4     png_0.1-7             vctrs_0.3.6           httpuv_1.5.4         
 [85] gtable_0.3.0          RANN_2.6.1            purrr_0.3.4           polyclip_1.10-0       tidyr_1.1.2           assertthat_0.2.1      scattermore_0.7      
 [92] future_1.21.0         rsvd_1.0.3            mime_0.9              xtable_1.8-4          RSpectra_0.16-0       later_1.1.0.1         survival_3.1-12      
 [99] viridisLite_0.3.0     tibble_3.0.4          cluster_2.1.0         globals_0.14.0        fitdistrplus_1.1-3    ellipsis_0.3.1        ROCR_1.0-11          . -->
ChristophH commented 3 years ago

By default VlnPlot pulls the expression values from the data slot. See the documentation here. Citing from the SCTransform documentation:

Returns a Seurat object with a new assay (named SCT by default) with counts being (corrected) counts, data being log1p(counts), scale.data being pearson residuals

So in your example above you are plotting the log1p-transformed corrected (discrete) counts. To plot the Pearson residuals set slot = "scale.data" in VlnPlot