satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.25k stars 904 forks source link

FindAllMarkers and FindMarkers pulls fold-change data from wrong slot #6976

Closed jbeanphd closed 1 year ago

jbeanphd commented 1 year ago

'FindAllMarkers' and 'FindMarkers' appear to be pulling fold change data from the wrong slot.

# insert reproducible example here
> library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

> library(Seurat)
> pbmc_small <- pbmc_small
> View(pbmc_small)
> pbmc_small <- SCTransform(pbmc_small)
Calculating cell attributes from input UMI matrix: log_umi
Variance stabilizing transformation of count matrix of size 220 by 80
Model formula is y ~ log_umi
Get Negative Binomial regression parameters per gene
Using 220 genes, 80 cells
  |======================================================================================================================================================| 100%
Second step: Get residuals using fitted parameters for 220 genes
  |======================================================================================================================================================| 100%
Computing corrected count matrix for 220 genes
  |======================================================================================================================================================| 100%
Calculating gene attributes
Wall clock passed: Time difference of 0.5991514 secs
Determine variable features
Place corrected count matrix in counts slot
Centering data matrix
  |======================================================================================================================================================| 100%
Set default assay to SCT
> pbmc_small <- FindVariableFeatures(pbmc_small)
> pbmc_small <- RunPCA(pbmc_small, features = pbmc_small@assays$SCT@var.features)
Warning in irlba(A = t(x = object), nv = npcs, ...) :
  You're computing too large a percentage of total singular values, use a standard svd instead.
PC_ 1 
Positive:  NKG7, CCL5, CST7, GZMA, CTSW, PRF1, GNLY, IL32, FGFBP2, GZMM 
       CD7, CD3D, LCK, GZMB, RARRES3, CD247, XBP1, LAMP1, CD3E, GZMH 
       SPON2, KLRD1, ALOX5AP, CCL4, EIF4A2, C12orf75, GYPC, KLRG1, LYAR, AKR1C3 
Negative:  CST3, LYZ, LST1, S100A9, IFITM3, TYMP, HLA-DRA, AIF1, HLA-DPA1, SERPINA1 
       S100A8, HLA-DPB1, CFD, FCN1, GRN, HLA-DMA, HLA-DRB5, CTSS, LGALS2, HLA-DRB1 
       IFI30, LGALS3, CFP, FCGRT, BID, HLA-DQB1, S100A11, CD68, HLA-DQA1, SAT1 
PC_ 2 
Positive:  PPBP, SDPR, CLU, GNG11, PF4, GPX1, SPARC, CD9, HIST1H2AC, GP9 
       TUBB1, NRGN, NCOA4, TPM4, CA2, NGFRAP1, ITGA2B, TREML1, TMEM40, PGRMC1 
       RUFY1, ODC1, CCL5, MYL9, FERMT3, PTCRA, ACRBP, PARVB, TSC22D1, TALDO1 
Negative:  NKG7, IFITM2, CST7, GZMA, FCGR3A, IL32, CTSW, EIF4A2, SSR2, FGFBP2 
       CD7, GNLY, CD3D, TYROBP, SRSF7, GZMM, PRF1, HSP90AA1, LCK, LGALS1 
       CXCR4, HNRNPA3, XBP1, GZMB, LTB, S100A11, FCER1G, HNRNPF, CD247, YWHAB 
PC_ 3 
Positive:  HLA-DRA, HLA-DQB1, LTB, HLA-DRB1, HLA-DPB1, MS4A1, CD79A, HLA-DPA1, HLA-DQA1, HLA-DRB5 
       HLA-DMB, CD79B, CXCR4, TCL1A, SP100, LY86, NT5C, SNHG7, HLA-DQA2, HLA-DMA 
       HVCN1, TMEM123, FCER2, LINC00926, PPAPDC1B, FCRLA, BANK1, EAF2, TNFAIP8, RP11-693J15.5 
Negative:  TYROBP, FCER1G, FCGR3A, LGALS1, LST1, CFD, CD68, AIF1, S100A9, IFITM3 
       S100A8, SERPINA1, FCN1, GZMB, CST3, SAT1, RHOC, PRF1, S100A11, GNLY 
       TYMP, PSAP, CARD16, FGFBP2, CTSS, GZMA, NKG7, RGS2, SMCO4, LILRA3 
PC_ 4 
Positive:  FGFBP2, GZMB, GNLY, PRF1, NKG7, CST7, HLA-DPB1, HLA-DRB1, HLA-DPA1, CTSW 
       HLA-DQB1, HLA-DRA, HLA-DQA1, CCL4, GZMA, AKR1C3, GZMH, HLA-DMA, IGFBP7, TTC38 
       HLA-DMB, KLRD1, HLA-DRB5, C12orf75, SPON2, IL2RB, PCMT1, CLIC3, ALOX5AP, GSTP1 
Negative:  CD3D, LTB, LDHB, IL7R, CD3E, TMEM123, EIF4A2, NOSIP, THYN1, IFITM2 
       TAF7, TAGAP, COTL1, MPHOSPH6, IL32, ACAP1, SRSF7, PIK3IP1, CD2, MAL 
       DNAJB1, EPC1, SAFB2, AIF1, GYPC, CXCR4, SAT1, ASNSD1, CTSS, CCR7 
PC_ 5 
Positive:  LYZ, LGALS2, S100A9, FCER1A, CLEC10A, GSTP1, MS4A6A, S100A8, HLA-DMA, IL1B 
       RGS1, LGALS3, CD14, ASGR1, CD3D, GRN, FCGRT, LDHB, TYMP, CST3 
       SRSF7, GPX1, CFP, IL32, IL7R, CD1C, MPEG1, FUOM, RNF130, LCK 
Negative:  FCGR3A, CD79B, RHOC, RP11-290F20.3, TNFRSF1B, MS4A1, HVCN1, LILRA3, SAT1, AIF1 
       IFITM2, MS4A7, CD79A, SERPINA1, CD68, LST1, CTSS, FCER1G, FAM96A, TCL1A 
       IFITM3, ADAR, EPC1, WARS, C5AR1, RGS2, HCK, LTB, PSAP, LINC00926 
> pbmc_small <- FindNeighbors(pbmc_small, dims = 1:30)
Computing nearest neighbor graph
Computing SNN
> pbmc_small <- FindClusters(pbmc_small)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 80
Number of edges: 1876

Running Louvain algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.5397
Number of communities: 4
Elapsed time: 0 seconds
> FindAllMarkers(pbmc_small, pseudocount.use = 0, assay = 'SCT', slot = 'data') |> head()
Calculating cluster 0
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
Calculating cluster 1
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
Calculating cluster 2
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
Calculating cluster 3
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
            p_val avg_log2FC pct.1 pct.2    p_val_adj cluster gene
LCK  1.652211e-11   5.946419 0.733  0.02 3.634864e-09       0  LCK
CD3D 5.627207e-11   7.451211 0.700  0.02 1.237985e-08       0 CD3D
CD7  6.484134e-11   6.380822 0.700  0.02 1.426510e-08       0  CD7
CST7 7.186854e-10   5.886713 0.667  0.04 1.581108e-07       0 CST7
IL32 7.878490e-10   6.544321 0.667  0.04 1.733268e-07       0 IL32
GZMM 9.576371e-10        Inf 0.600  0.00 2.106802e-07       0 GZMM

# 'FoldChange' gives a different result than is reported by 'FindAllMarkers' when both use 'assay = "SCT"' and 'slot = "data"'

> FoldChange(pbmc_small, pseudocount.use = 0, ident.1 = 0, ident.2 = c(1:3), features = 'LCK', assay = 'SCT', slot = 'data')
    avg_log2FC pct.1 pct.2
LCK   5.627499 0.733  0.02

#specifying 'slot = "counts"' returns the same fold change reported by 'FindAllMarkers' that was set to 'slot = "data"'

> FoldChange(pbmc_small, pseudocount.use = 0, ident.1 = 0, ident.2 = c(1:3), features = 'LCK', assay = 'SCT', slot = 'counts')
    avg_log2FC pct.1 pct.2
LCK   5.946419 0.733  0.02

#the same is true for 'FindMarkers'

> FindMarkers(pbmc_small, pseudocount.use = 0, ident.1 = 0, ident.2 = 1, features = 'LCK', assay = 'SCT', slot = 'data')
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=00s  
           p_val avg_log2FC pct.1 pct.2    p_val_adj
LCK 3.088212e-07   5.003002 0.733 0.038 6.794065e-05

# 'FoldChange' gives a different result than is reported by  'FindMarkers'

> FoldChange(pbmc_small, pseudocount.use = 0, ident.1 = 0, ident.2 = 1, features = 'LCK', assay = 'SCT', slot = 'data')
    avg_log2FC pct.1 pct.2
LCK   4.684082 0.733 0.038

#again specifying 'slot = "counts"' returns the same result that 'FindMarkers' reports when 'slot = "data"'

> FoldChange(pbmc_small, pseudocount.use = 0, ident.1 = 0, ident.2 = 1, features = 'LCK', assay = 'SCT', slot = 'counts')
    avg_log2FC pct.1 pct.2
LCK   5.003002 0.733 0.038

#pulling the values out manually reveals 'FoldChange' is reporting the correct values while 'FindAllMarkers' and 'FindMarkers'
#are pulling fold change data from the wrong slot.

> pbmc_small@assays$SCT@data['LCK', (row.names(filter(pbmc_small@meta.data, seurat_clusters == 0)))] |> mean()
[1] 0.6853326
> pbmc_small@assays$SCT@data['LCK', (row.names(filter(pbmc_small@meta.data, seurat_clusters == 1)))] |> mean()
[1] 0.02665951
> pbmc_small@assays$SCT@data['LCK', (row.names(filter(pbmc_small@meta.data, seurat_clusters != 0)))] |> mean()
[1] 0.01386294
> log2(0.6853326/0.02665951)
[1] 4.684082
> log2(0.6853326/0.01386294)
[1] 5.627499
> pbmc_small@assays$SCT@counts['LCK', (row.names(filter(pbmc_small@meta.data, seurat_clusters == 0)))] |> mean()
[1] 1.233333
> pbmc_small@assays$SCT@counts['LCK', (row.names(filter(pbmc_small@meta.data, seurat_clusters == 1)))] |> mean()
[1] 0.03846154
> pbmc_small@assays$SCT@counts['LCK', (row.names(filter(pbmc_small@meta.data, seurat_clusters != 0)))] |> mean()
[1] 0.02
> log2(1.233333/0.03846154)
[1] 5.003002
> log2(1.233333/0.02)
[1] 5.946419

> sessionInfo()
R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Seurat_4.3.0       dplyr_1.1.0        SeuratObject_4.1.3 sp_1.6-0          

loaded via a namespace (and not attached):
  [1] nlme_3.1-162           matrixStats_0.63.0     spatstat.sparse_3.0-0  RcppAnnoy_0.0.20       RColorBrewer_1.1-3     httr_1.4.4            
  [7] backports_1.4.1        sctransform_0.3.5      tools_4.2.2            utf8_1.2.3             R6_2.5.1               irlba_2.3.5.1         
 [13] KernSmooth_2.23-20     uwot_0.1.14            lazyeval_0.2.2         colorspace_2.1-0       withr_2.5.0            tidyselect_1.2.0      
 [19] gridExtra_2.3          compiler_4.2.2         progressr_0.13.0       cli_3.6.0              spatstat.explore_3.0-6 plotly_4.10.1         
 [25] scales_1.2.1           lmtest_0.9-40          spatstat.data_3.0-0    ggridges_0.5.4         pbapply_1.7-0          readODS_1.8.0         
 [31] goftest_1.2-3          stringr_1.5.0          digest_0.6.30          spatstat.utils_3.0-1   pkgconfig_2.0.3        htmltools_0.5.4       
 [37] parallelly_1.34.0      limma_3.52.4           fastmap_1.1.0          htmlwidgets_1.6.1      rlang_1.0.6            rstudioapi_0.14       
 [43] shiny_1.7.4            generics_0.1.3         zoo_1.8-11             jsonlite_1.8.3         spatstat.random_3.1-3  ica_1.0-3             
 [49] car_3.1-1              magrittr_2.0.3         patchwork_1.1.2        Matrix_1.5-3           Rcpp_1.0.10            munsell_0.5.0         
 [55] fansi_1.0.4            abind_1.4-5            reticulate_1.28        lifecycle_1.0.3        stringi_1.7.12         carData_3.0-5         
 [61] MASS_7.3-58.2          Rtsne_0.16             plyr_1.8.8             grid_4.2.2             parallel_4.2.2         listenv_0.8.0         
 [67] promises_1.2.0.1       ggrepel_0.9.3          deldir_1.0-6           miniUI_0.1.1.1         lattice_0.20-45        cowplot_1.1.1         
 [73] splines_4.2.2          tensor_1.5             pillar_1.8.1           igraph_1.3.5           spatstat.geom_3.0-6    future.apply_1.10.0   
 [79] reshape2_1.4.4         codetools_0.2-19       leiden_0.4.3           gprofiler2_0.2.1       glue_1.6.2             data.table_1.14.6     
 [85] png_0.1-8              vctrs_0.5.2            httpuv_1.6.9           polyclip_1.10-4        gtable_0.3.1           RANN_2.6.1            
 [91] purrr_1.0.1            tidyr_1.3.0            scattermore_0.8        future_1.31.0          ggplot2_3.4.1          mime_0.12             
 [97] broom_1.0.3            xtable_1.8-4           rstatix_0.7.2          later_1.3.0            survival_3.5-3         viridisLite_0.4.1     
[103] tibble_3.1.8           cluster_2.1.4          globals_0.16.2         fitdistrplus_1.1-8     ellipsis_0.3.2         ROCR_1.0-11  
saketkc commented 1 year ago

Thanks for the bug report and reproducible example @jbeanphd! This is now fixed in the develop branc https://github.com/satijalab/seurat/commit/e08e1084049ee9387591e442e00799645d833591

You can check out the latest updates by installing the develop branch:

remotes::install_github("satijalab/seurat", ref="develop")

If you are still facing issues, please open a new issue.