Closed twoneu closed 1 year ago
Hi,
20k cells certainly shouldn't be a problem, how many peaks do you have? Do you have mbkmeans installed?
Can you reproduce the error without the aggregation (i.e. just scDblFinder(peak_assay)
) ?
I've never seen this error, so if it'd be possible to share the object (ideally smaller if you can still reproduce the error, genes/samples can obviously be scrambled) it'll make debugging easier.
Pierre-Luc
Hi @twoneu , please respond or I'll close the issue.
I also have same error. Did you solve it? I have also just 20000 cell( in cellranger websummary). but My another data have also 20000cell. but it ran fine without error.
Then please provide the extra info requested above.
Owner counts <- Read10X_h5("/data/jrgong/AD_multiome/fastq/aggr_2023_0616/aggr_2023_0616/outs/filtered_feature_bc_matrix.h5") fragpath <- "/data/jrgong/AD_multiome/fastq//aggr_2023_0616/aggr_2023_0616/outs/atac_fragments.tsv.gz"
annotation <- GetGRangesFromEnsDb(ensdb = EnsDb.Hsapiens.v86)
seqlevelsStyle(annotation) <- "UCSC"
AH_Mul <- CreateSeuratObject(counts = counts$Gene Expression
, assay = "RNA")
AH_Mul[["ATAC"]] <- CreateChromatinAssay(counts = counts$Peaks, sep = c(":","-"), fragments = fragpath, annotation = annotation)
AH.PBMC.7 <- subset(AH_Mul, subset=aggr_number=="7")
ce <- scDblFinder(SingleCellExperiment(list(counts=AH.PBMC.7@assays$RNA@counts))) AH.PBMC.7$doublet_scores <- sce$scDblFinder.score AH.PBMC.7$doublet_class <- sce$scDblFinder.class
and then I saw the error like this :
could you let me know how to solve it?
Can you please report your session info (as one should always do)?
Hi @plger, sorry for the delay! What is the best way to share the data with you?
If you don't have a drive/platform where you can put it, send me an email at pierre-luc.germain [at ] hest.ethz.ch and I'll give you a link.
Thank you @plger for helping me solve this issue! I was able to successfully run scDblFinder by:
installing mbkmeans
did not solve the issue for me. I'm still getting:
Creating ~25000 artificial doublets...
Dimensional reduction
Evaluating kNN...
Training model...
Error in if (length(expected) > 1 && x > min(expected) && x < max(expected)) return(0) :
missing value where TRUE/FALSE needed
Calls: scDblFinder ... .optimThreshold -> optimize -> <Anonymous> -> f -> .prop.dev
In addition: Warning message:
In scDblFinder(sce) :
You are trying to run scDblFinder on a very large number of cells. If these are from different captures, please specify this using the `samples` argument.TRUE
Execution halted
Please report session info.
R version 4.3.3 (2024-02-29)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: GridOS 22.04.5
Matrix products: default
BLAS/LAPACK: /home/gridsan/lenail/.conda/envs/SoupX/lib/libopenblasp-r0.3.27.so; LAPACK version 3.12.0
locale:
[1] C
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] Matrix_1.6-5 scDblFinder_1.16.0
[3] SingleCellExperiment_1.24.0 SummarizedExperiment_1.32.0
[5] Biobase_2.62.0 GenomicRanges_1.54.1
[7] GenomeInfoDb_1.38.1 IRanges_2.36.0
[9] S4Vectors_0.40.2 BiocGenerics_0.48.1
[11] MatrixGenerics_1.14.0 matrixStats_1.4.1
loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 viridisLite_0.4.2
[3] dplyr_1.1.4 vipor_0.4.7
[5] viridis_0.6.5 Biostrings_2.70.1
[7] bitops_1.0-9 RCurl_1.98-1.16
[9] bluster_1.12.0 GenomicAlignments_1.38.0
[11] XML_3.99-0.17 rsvd_1.0.5
[13] lifecycle_1.0.4 cluster_2.1.6
[15] statmod_1.5.0 magrittr_2.0.3
[17] compiler_4.3.3 rlang_1.1.4
[19] tools_4.3.3 igraph_2.0.3
[21] utf8_1.2.4 yaml_2.3.10
[23] data.table_1.15.4 rtracklayer_1.62.0
[25] S4Arrays_1.2.0 dqrng_0.3.2
[27] xgboost_2.1.1.1 DelayedArray_0.28.0
[29] abind_1.4-5 BiocParallel_1.36.0
[31] grid_4.3.3 fansi_1.0.6
[33] beachmat_2.18.0 colorspace_2.1-1
[35] edgeR_4.0.16 ggplot2_3.5.1
[37] scales_1.3.0 MASS_7.3-60.0.1
[39] cli_3.6.3 crayon_1.5.3
[41] generics_0.1.3 metapod_1.10.0
[43] rjson_0.2.23 DelayedMatrixStats_1.24.0
[45] scuttle_1.12.0 ggbeeswarm_0.7.2
[47] zlibbioc_1.48.0 parallel_4.3.3
[49] XVector_0.42.0 restfulr_0.0.15
[51] vctrs_0.6.5 jsonlite_1.8.9
[53] BiocSingular_1.18.0 BiocNeighbors_1.20.0
[55] ggrepel_0.9.6 irlba_2.3.5.1
[57] beeswarm_0.4.0 scater_1.30.1
[59] locfit_1.5-9.9 limma_3.58.1
[61] glue_1.8.0 codetools_0.2-20
[63] gtable_0.3.5 BiocIO_1.12.0
[65] ScaledMatrix_1.10.0 munsell_0.5.1
[67] tibble_3.2.1 pillar_1.9.0
[69] GenomeInfoDbData_1.2.11 R6_2.5.1
[71] sparseMatrixStats_1.14.0 lattice_0.22-6
[73] Rsamtools_2.18.0 scran_1.30.0
[75] Rcpp_1.0.13 gridExtra_2.3
[77] SparseArray_1.2.2 pkgconfig_2.0.3
If possible try using the latest version (e.g. installing from github). I made some changes that fix thresholding issues in some circumstances which could have lead to such an error, although I can't say whether it's the case here. If you still encounter the issue, please indicate the exact command you used.
I installed the github version. I'm still getting the same error.
Creating ~25000 artificial doublets...
Dimensional reduction
Evaluating kNN...
Training model...
Error in if (length(expected) > 1 && x > min(expected) && x < max(expected)) return(0) :
missing value where TRUE/FALSE needed
Calls: scDblFinder ... .optimThreshold -> optimize -> <Anonymous> -> f -> .prop.dev
In addition: Warning message:
In scDblFinder(sce) :
You are trying to run scDblFinder on a very large number of cells. If these are from different captures, please specify this using the `samples` argument.TRUE
Execution halted
My script looks like
mat <- readMM(gzfile(matrix_file))
barcodes <- read.delim(gzfile(barcodes_file), header = FALSE)
features <- read.delim(gzfile(features_file), header = FALSE)
rownames(mat) <- features[, 1]
colnames(mat) <- barcodes[, 1]
sce <- SingleCellExperiment(assays = list(counts = mat))
sce <- scDblFinder(sce)
scDblFinder_out_dir <- file.path(sample_dir, "scDblFinder_soupx_results")
dir.create(scDblFinder_out_dir, showWarnings = FALSE)
doublet_results_file <- file.path(scDblFinder_out_dir, "doublet_results.tsv")
write.table(colData(sce), doublet_results_file, sep = "\t", quote = FALSE)
This script worked for 94/96 samples, but failed for samples F10 and G10. I suspect there's something about the data in those samples which is making scDblFinder crash. In what circumstances would scDblFinder crash this way?
Thanks for trying it.
The known reasons so far were: 1) some issues related to feature aggregation with ATAC-seq data (solved by installing mbkmeans, above) - not your problem. 2) no significant anti-correlation between genes leading to flat cxds scores, which are normally use to bootstrap the iterative procedure. I've only seen this happen once in data where all cells looked the same, but anyway now there's a failsafe for that. 3) a too large fraction of doublets detected in the first training iteration, which was also solved in recent versions.
If you can share with me the matrix of one of the two samples (can be without the cells/feature IDs), I'll try to figure out what's happening in your case.
@plger I emailed my matrix to the address you provided above. Let me know if you do not receive it.
Hi,
I got your data, thanks, and finally got to try it: I could run scDblFinder on it without problem. The scores have a nice bimodal distribution and all.
One thing occurred to me: are you sure that you did in fact install the latest scDblFinder? I.e. does packageVersion("scDblFinder")
give you 1.19.7?
I realized that if you only tried to install the github latest version without updating to bioc devel, it would fail because of some changes that were introduced in BiocNeighbors and which you don't have.
So to get the bug fixes without getting the changes that require the upcoming Bioc version, you need to pull a specific commit, e.g. doing this:
remotes::install_github("plger/scDblFinder", ref="6f82238ea97f393f6ef4c475f3a19ccb3a88898f")
Let me know if that fixes your issue,
Pierre-Luc
Ah!
> packageVersion("scDblFinder")
[1] '1.16.0'
Let me fix this and see what happens.
Hi! Thank you for this great tool. I am encountering the error in the title when running scDblFinder on a large dataset (CellRanger estimated ~20,000 cells):
I have not encountered this error in several other (much smaller) samples I have tried, so is this related to the dataset being too large?
Session info