satijalab / seurat

R toolkit for single cell genomics
http://www.satijalab.org/seurat
Other
2.21k stars 897 forks source link

Error in HTODemux: Cells with zero counts exist as a cluster. (reproducible example) #7254

Closed ms3389 closed 1 year ago

ms3389 commented 1 year ago

Hello,

I have 2 samples from the same 10x run. Both have similar QC and data structure both for counts and hashtags. However one sample throws an error when running HTODemux(). Any suggestions on how to fix this are welcome! Thank you!

Problematic sample1: https://www.dropbox.com/s/f96ysu7kpvwn5bz/sample.tmp.seurat1.rds?dl=0 Normal functionality sample2: https://www.dropbox.com/s/c0fkalhdkjitquu/sample.tmp.seurat2.rds?dl=0

Code to reproduce:

library(Seurat)

sample.tmp.seurat1 = readRDS("sample.tmp.seurat1.rds") sample.tmp.seurat2 = readRDS("sample.tmp.seurat2.rds")

sample.tmp.seurat1 <- HTODemux(sample.tmp.seurat1, assay = "HTO", positive.quantile = 0.99) sample.tmp.seurat2 <- HTODemux(sample.tmp.seurat2, assay = "HTO", positive.quantile = 0.99)

######################################################################## ########################################################################

sample.tmp.seurat1 <- HTODemux(sample.tmp.seurat1, assay = "HTO", positive.quantile = 0.99) Error in HTODemux(sample.tmp.seurat1, assay = "HTO", positive.quantile = 0.99) : Cells with zero counts exist as a cluster. sample.tmp.seurat2 <- HTODemux(sample.tmp.seurat2, assay = "HTO", positive.quantile = 0.99) Cutoff for Tag1-GTCAACTCTTTAGCG : 4 reads Cutoff for Tag2-TGATGGCCTATTGGG : 17 reads Cutoff for unmapped : 21 reads

sessionInfo() R version 4.1.1 (2021-08-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] SeuratObject_4.1.3 Seurat_4.3.0 Matrix_1.5-4

loaded via a namespace (and not attached): [1] plyr_1.8.8 igraph_1.4.2 lazyeval_0.2.2 sp_1.6-0 splines_4.1.1
[6] listenv_0.9.0 scattermore_0.8 usethis_2.1.6 GenomeInfoDb_1.30.1 ggplot2_3.4.2
[11] TH.data_1.1-1 digest_0.6.31 foreach_1.5.2 htmltools_0.5.5 rsconnect_0.8.29
[16] fansi_1.0.4 magrittr_2.0.3 memoise_2.0.1 tensor_1.5 cluster_2.1.2
[21] doParallel_1.0.17 ROCR_1.0-11 limma_3.50.3 remotes_2.4.2 globals_0.16.2
[26] fastcluster_1.2.3 matrixStats_0.63.0 sandwich_3.0-2 spatstat.sparse_3.0-1 prettyunits_1.1.1
[31] colorspace_2.1-0 ggrepel_0.9.3 xfun_0.39 dplyr_1.1.2 callr_3.7.3
[36] crayon_1.5.2 RCurl_1.98-1.12 jsonlite_1.8.4 libcoin_1.0-9 spatstat.data_3.0-1
[41] progressr_0.13.0 survival_3.2-11 zoo_1.8-12 iterators_1.0.14 ape_5.7-1
[46] glue_1.6.2 polyclip_1.10-4 gtable_0.3.3 zlibbioc_1.40.0 XVector_0.34.0
[51] leiden_0.4.3 DelayedArray_0.20.0 pkgbuild_1.4.0 future.apply_1.10.0 SingleCellExperiment_1.16.0 [56] BiocGenerics_0.40.0 abind_1.4-5 scales_1.2.1 futile.options_1.0.1 mvtnorm_1.1-3
[61] edgeR_3.36.0 spatstat.random_3.1-4 miniUI_0.1.1.1 Rcpp_1.0.10 viridisLite_0.4.1
[66] xtable_1.8-4 reticulate_1.28 stats4_4.1.1 profvis_0.3.7 httr_1.4.5
[71] htmlwidgets_1.6.2 gplots_3.1.3 RColorBrewer_1.1-3 modeltools_0.2-23 ellipsis_0.3.2
[76] ica_1.0-3 urlchecker_1.0.1 pkgconfig_2.0.3 reshape_0.8.9 uwot_0.1.14
[81] deldir_1.0-6 locfit_1.5-9.7 utf8_1.2.3 tidyselect_1.2.0 rlang_1.1.0
[86] reshape2_1.4.4 later_1.3.0 munsell_0.5.0 phyclust_0.1-33 tools_4.1.1
[91] cachem_1.0.7 cli_3.6.1 generics_0.1.3 ggridges_0.5.4 devtools_2.4.5
[96] stringr_1.5.0 fastmap_1.1.1 argparse_2.2.2 goftest_1.2-3 processx_3.8.0
[101] knitr_1.42 fs_1.5.2 fitdistrplus_1.1-11 caTools_1.18.2 purrr_1.0.1
[106] RANN_2.6.1 coin_1.4-2 pbapply_1.7-0 future_1.32.0 nlme_3.1-152
[111] mime_0.12 formatR_1.14 compiler_4.1.1 rstudioapi_0.14 plotly_4.10.1
[116] png_0.1-8 spatstat.utils_3.0-2 tibble_3.2.1 stringi_1.7.12 ps_1.7.2
[121] futile.logger_1.4.3 lattice_0.20-44 vctrs_0.6.1 pillar_1.9.0 lifecycle_1.0.3
[126] BiocManager_1.30.20 spatstat.geom_3.1-0 lmtest_0.9-40 RcppAnnoy_0.0.20 data.table_1.14.8
[131] irlba_2.3.5.1 cowplot_1.1.1 bitops_1.0-7 patchwork_1.1.2 httpuv_1.6.9
[136] GenomicRanges_1.46.1 R6_2.5.1 promises_1.2.0.1 KernSmooth_2.23-20 gridExtra_2.3
[141] IRanges_2.28.0 parallelly_1.35.0 rjags_4-13 sessioninfo_1.2.2 codetools_0.2-18
[146] lambda.r_1.2.4 MASS_7.3-54 gtools_3.9.4 pkgload_1.3.2 SummarizedExperiment_1.24.0 [151] sctransform_0.3.5 infercnv_1.10.1 multcomp_1.4-23 S4Vectors_0.32.4 GenomeInfoDbData_1.2.7
[156] parallel_4.1.1 grid_4.1.1 tidyr_1.3.0 coda_0.19-4 MatrixGenerics_1.6.0
[161] Rtsne_0.16 spatstat.explore_3.1-0 Biobase_2.54.0 shiny_1.7.4

mhkowalski commented 1 year ago

Hi, I downloaded your data and took a quick look. It seems you are trying to use unmapped HTO reads as a feature for HTODemux, which I'm not sure makes sense.

The reason why this is happening is that there are a lower number of unmapped reads in your first Seurat object, so upon clustering, you get a cluster with 0 counts for this feature, which prevents you from accurately estimating a background distribution.

Getting rid of unmapped reads fixed this problem for me.

m <- obj[['HTO']]@counts[1:2,]
obj[['HTOnew']] <- CreateAssayObject(counts = m)
DefaultAssay(obj) <- 'HTOnew'
obj <- NormalizeData(obj, normalization.method = "CLR")
obj <- HTODemux(obj, assay = "HTOnew", positive.quantile = 0.99)