plger / scDblFinder

Methods for detecting doublets in single-cell sequencing data
https://plger.github.io/scDblFinder/
GNU General Public License v3.0
162 stars 17 forks source link

SCTransform example in vignette not working #105

Closed JesseRop closed 3 months ago

JesseRop commented 6 months ago

Describe the bug I'm trying to replicate the example in the vignette to use SCT internally within scdblFinder but I get the error below

Error in if (length(sel_features) > nfeatures) sel_features <- selFeatures(sce[sel_features, : the condition has length > 1

MRE -- Minimal example to reproduce the bug

nfeatures <- 1000
sce <- mockDoubletSCE(dbl.rate=0.1, ngenes=2000 )
# sctransform on real cells:
vst1 <- sctransform::vst(counts(sce), n_cells=min(ncol(sce),5000), verbosity=0)
sce <- sce[row.names(vst1$y),]
logcounts(sce) <- vst1$y
hvg <- row.names(sce)[head(order(vst1$gene_attr$residual_variance, decreasing=TRUE), nfeatures)]

# define a processing function that scDblFinder will use on the real+artificial doublets;
# the input should be a count matrix and the number of dimensions, and the output a PCA matrix

myfun <- function(e, dims){
  # we use the thetas calculated from the first vst on real cells
  e <- e[intersect(row.names(e), row.names(vst1$model_pars_fit[which(!is.na(vst1$model_pars_fit[,"theta"])),])),]
  vst2 <- sctransform::vst(e, n_cells=min(ncol(e),5000), method="nb_theta_given", 
                           theta_given=vst1$model_pars_fit[row.names(e),"theta"],
                           min_cells=1L, verbosity=0)
  scater::calculatePCA(vst2$y, ncomponents=dims)
}

sce <- scDblFinder(sce, processing=myfun, nfeatures=hvg)

Traceback 1: scDblFinder(sce, processing = myfun, nfeatures = hvg)

Session info

R version 4.2.2 (2022-10-31) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS 14.4

Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages: [1] grid stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] reticulate_1.31 caret_6.0-94 ggsankey_0.0.99999 scmap_1.20.2 ggh4x_0.2.4 MetBrewer_0.2.0
[7] SoupX_1.6.2 conflicted_1.2.0 ggvenn_0.1.9 Platypus_3.4.1 RColorBrewer_1.1-3 rmarkdown_2.24
[13] Hmisc_4.7-2 Formula_1.2-4 survival_3.5-0 lattice_0.20-45 plotly_4.10.2 SeuratDisk_0.0.0.9020
[19] SeuratData_0.2.2 scDblFinder_1.12.0 SingleCellExperiment_1.20.0 SummarizedExperiment_1.28.0 Biobase_2.58.0 GenomicRanges_1.50.2
[25] GenomeInfoDb_1.34.9 IRanges_2.32.0 S4Vectors_0.36.1 BiocGenerics_0.44.0 MatrixGenerics_1.10.0 matrixStats_1.0.0
[31] cowplot_1.1.1 tidyr_1.3.0 tibble_3.2.1 readr_2.1.4 purrr_1.0.2 stringr_1.5.0
[37] dplyr_1.1.2 ggpubr_0.5.0 ggrepel_0.9.3 ggplot2_3.4.3 DoubletFinder_2.0.3 patchwork_1.1.3
[43] rgl_1.0.1 sctransform_0.3.5 SeuratObject_4.1.3 Seurat_4.3.0.1

loaded via a namespace (and not attached): [1] rsvd_1.0.5 ica_1.0-3 class_7.3-21 Rsamtools_2.14.0 foreach_1.5.2 lmtest_0.9-40
[7] crayon_1.5.2 MASS_7.3-58.2 nlme_3.1-161 backports_1.4.1 rlang_1.1.1 XVector_0.38.0
[13] ROCR_1.0-11 readxl_1.4.2 irlba_2.3.5.1 limma_3.54.2 scater_1.26.1 xgboost_1.7.3.1
[19] BiocParallel_1.32.5 rjson_0.2.21 bit64_4.0.5 glue_1.6.2 pheatmap_1.0.12 parallel_4.2.2
[25] vipor_0.4.5 spatstat.sparse_3.0-2 dotCall64_1.0-2 spatstat.geom_3.2-4 tidyselect_1.2.0 fitdistrplus_1.1-11
[31] XML_3.99-0.13 zoo_1.8-12 GenomicAlignments_1.34.0 xtable_1.8-4 magrittr_2.0.3 evaluate_0.21
[37] scuttle_1.8.4 cli_3.6.1 zlibbioc_1.44.0 rstudioapi_0.14 miniUI_0.1.1.1 sp_2.0-0
[43] rpart_4.1.19 treeio_1.22.0 maps_3.4.1 fields_14.1 shiny_1.7.5 BiocSingular_1.14.0
[49] xfun_0.40 clue_0.3-63 cluster_2.1.4 ape_5.6-2 listenv_0.9.0 Biostrings_2.66.0
[55] png_0.1-8 reshape_0.8.9 future_1.33.0 ipred_0.9-14 withr_2.5.0 bitops_1.0-7
[61] ggforce_0.4.1 plyr_1.8.8 cellranger_1.1.0 hardhat_1.3.1 pROC_1.18.5 e1071_1.7-12
[67] dqrng_0.3.0 pillar_1.9.0 GlobalOptions_0.1.2 cachem_1.0.8 hdf5r_1.3.8 GetoptLong_1.0.5
[73] googleVis_0.7.1 DelayedMatrixStats_1.20.0 vctrs_0.6.3 ellipsis_0.3.2 generics_0.1.3 lava_1.8.0
[79] tools_4.2.2 foreign_0.8-84 beeswarm_0.4.0 munsell_0.5.0 tweenr_2.0.2 proxy_0.4-27
[85] DelayedArray_0.24.0 fastmap_1.1.1 compiler_4.2.2 abind_1.4-5 httpuv_1.6.11 rtracklayer_1.58.0
[91] prodlim_2023.08.28 GenomeInfoDbData_1.2.9 gridExtra_2.3 edgeR_3.40.2 deldir_1.0-9 utf8_1.2.3
[97] later_1.3.1 recipes_1.0.10 jsonlite_1.8.7 GGally_2.1.2 scales_1.2.1 ScaledMatrix_1.6.0
[103] tidytree_0.4.2 pbapply_1.7-2 carData_3.0-5 sparseMatrixStats_1.10.0 lazyeval_0.2.2 promises_1.2.1
[109] car_3.1-1 doParallel_1.0.17 R.utils_2.12.2 latticeExtra_0.6-30 goftest_1.2-3 spatstat.utils_3.0-3
[115] checkmate_2.1.0 statmod_1.5.0 Rtsne_0.16 uwot_0.1.16 igraph_1.5.1 yaml_2.3.7
[121] htmltools_0.5.6 memoise_2.0.1 BiocIO_1.8.0 locfit_1.5-9.7 viridisLite_0.4.2 digest_0.6.33
[127] mime_0.12 rappdirs_0.3.3 spam_2.9-1 yulab.utils_0.0.6 future.apply_1.11.0 data.table_1.14.8
[133] R.oo_1.25.0 labeling_0.4.2 splines_4.2.2 RCurl_1.98-1.12 broom_1.0.3 hms_1.1.2
[139] colorspace_2.1-0 base64enc_0.1-3 BiocManager_1.30.22 ggbeeswarm_0.7.1 shape_1.4.6 aplot_0.1.9
[145] nnet_7.3-18 Rcpp_1.0.11 RANN_2.6.1 circlize_0.4.15 fansi_1.0.4 tzdb_0.4.0
[151] ModelMetrics_1.2.2.2 parallelly_1.36.0 R6_2.5.1 ggridges_0.5.4 lifecycle_1.0.3 bluster_1.8.0
[157] ggsignif_0.6.4 leiden_0.4.3 Matrix_1.5-3 RcppAnnoy_0.0.21 iterators_1.0.14 spatstat.explore_3.2-1
[163] gower_1.0.1 htmlwidgets_1.6.2 beachmat_2.14.0 polyclip_1.10-4 timechange_0.2.0 gridGraphics_0.5-1
[169] ComplexHeatmap_2.14.0 globals_0.16.2 htmlTable_2.4.1 spatstat.random_3.1-5 progressr_0.14.0 lubridate_1.9.2
[175] codetools_0.2-18 metapod_1.6.0 randomForest_4.7-1.1 R.methodsS3_1.8.2 gtable_0.3.3 DBI_1.1.3
[181] ggfun_0.0.9 tensor_1.5 httr_1.4.7 KernSmooth_2.23-20 stringi_1.7.12 reshape2_1.4.4
[187] farver_2.1.1 viridis_0.6.2 timeDate_4032.109 ggtree_3.6.2 BiocNeighbors_1.16.0 restfulr_0.0.15
[193] interp_1.1-3 ggplotify_0.1.0 scattermore_1.2 scran_1.26.2 bit_4.0.5 jpeg_0.1-10

plger commented 6 months ago

Hi,

Thanks for the nicely reported issue.

You're using scDblFinder 1.12.0, the current release one is 1.16.0, and the ability to specify which genes to use was added in 1.13.7, which is why you're getting an error. It should work if you update; alternatively you can use the following workaround:

# same as your example up until running scDblFinder, then
# simply subset to the hvg instead of manually specifying them:
sce2 <- scDblFinder(sce[hvg,], processing=myfun, nfeatures=nfeatures)
# then put the results back in the original object:
sce$scDblFinder.score <- sce2$scDblFinder.score
sce$scDblFinder.class <- sce2$scDblFinder.class

Pierre-Luc