Where to run scDblFinder in SCT workflow

Hi,

I wanted to double-check where I should run scDblFinder in my workflow (Seurat v5). For context, I have 4 conditions, with 3, 3, 2 and 2 samples each (total 10 samples), and I'd like to use the cluster-based approach.

Import 10X Cellranger outputs
Filter out barcodes with low coverage (e.g. UMI counts < 500)
Perform additional filtering based on QC
Normalise each sample individually (SCTransform)
Merge all samples into one single Seurat object
Run SelectIntegrationFeatures, GetResiduals, VariableFeatures
Dimensionality reduction and visualization (PCA, UMAP)
Run PrepSCTFindMarkers
Other downstream analyses (eg differential gene expression analysis)

I wanted to use the clusters I have obtained, but I'm unsure if I should run it after step 7 or step 8 in the workflow above. I tried both and got the following:

After step 7 singlet doublet 47951 364

After step 8 singlet doublet 48017 298

The difference isn't that big and I intend to keep the identified doublets to visualise them in clusters etc. However, I'd like to know which would be more appropriate mathematically (?) and why. Thank you! Included my scDblFinder code below.

allsamples <- readRDS('RDS Files/allsamples_clusters_prepsctfindmarkers.RDS')
layers <- samples@assays$SCT@counts
bp <- MulticoreParam(2, progressbar = TRUE, RNGseed=123)

allsamples_doublets <- scDblFinder(layers, clusters=allsamples$seurat_clusters, 
                               allsamples=allsamples$orig.ident, BPPARAM= bp)

table(allsamples_doublets$scDblFinder.class)

allsamples$scDblFinder.score <- allsamples_doublets$scDblFinder.score
allsamples$scDblFinder.class <- allsamples_doublets$scDblFinder.class

Hi,

Apologies for the delay.

First I just want to note that your example is using an argument name (allsamples) that doesn't exist (I assume that's a mistake of putting it on github, not of what you actually run).

Unless I'm mistaken, PrepSCTFindMarkers will change the counts present in O@assays$SCT@counts with corrected ones. I can't say I really tested this, but I would advise against using those corrected counts for the purpose of doublet detection. The whole method was designed for raw counts (with the library size effects and all), and depends on the artificial doublets being comparable to real ones. So I'd say either do not use the SCT assay for that purpose, or do it after step 7 (because if I remember correctly, SCTransform does not itself change the counts slot).

plger / scDblFinder

Where to run scDblFinder in SCT workflow #107