plger / scDblFinder

Methods for detecting doublets in single-cell sequencing data
https://plger.github.io/scDblFinder/
GNU General Public License v3.0
153 stars 18 forks source link

Where to run scDblFinder in SCT workflow #107

Open shaln opened 3 months ago

shaln commented 3 months ago

Hi,

I wanted to double-check where I should run scDblFinder in my workflow (Seurat v5). For context, I have 4 conditions, with 3, 3, 2 and 2 samples each (total 10 samples), and I'd like to use the cluster-based approach.

  1. Import 10X Cellranger outputs
  2. Filter out barcodes with low coverage (e.g. UMI counts < 500)
  3. Perform additional filtering based on QC
  4. Normalise each sample individually (SCTransform)
  5. Merge all samples into one single Seurat object
  6. Run SelectIntegrationFeatures, GetResiduals, VariableFeatures
  7. Dimensionality reduction and visualization (PCA, UMAP)
  8. Run PrepSCTFindMarkers
  9. Other downstream analyses (eg differential gene expression analysis)

I wanted to use the clusters I have obtained, but I'm unsure if I should run it after step 7 or step 8 in the workflow above. I tried both and got the following:

After step 7 singlet doublet 47951 364

After step 8 singlet doublet 48017 298

The difference isn't that big and I intend to keep the identified doublets to visualise them in clusters etc. However, I'd like to know which would be more appropriate mathematically (?) and why. Thank you! Included my scDblFinder code below.

allsamples <- readRDS('RDS Files/allsamples_clusters_prepsctfindmarkers.RDS')
layers <- samples@assays$SCT@counts
bp <- MulticoreParam(2, progressbar = TRUE, RNGseed=123)

allsamples_doublets <- scDblFinder(layers, clusters=allsamples$seurat_clusters, 
                               allsamples=allsamples$orig.ident, BPPARAM= bp)

table(allsamples_doublets$scDblFinder.class)

allsamples$scDblFinder.score <- allsamples_doublets$scDblFinder.score
allsamples$scDblFinder.class <- allsamples_doublets$scDblFinder.class
plger commented 2 months ago

Hi,

Apologies for the delay.

First I just want to note that your example is using an argument name (allsamples) that doesn't exist (I assume that's a mistake of putting it on github, not of what you actually run).

Unless I'm mistaken, PrepSCTFindMarkers will change the counts present in O@assays$SCT@counts with corrected ones. I can't say I really tested this, but I would advise against using those corrected counts for the purpose of doublet detection. The whole method was designed for raw counts (with the library size effects and all), and depends on the artificial doublets being comparable to real ones. So I'd say either do not use the SCT assay for that purpose, or do it after step 7 (because if I remember correctly, SCTransform does not itself change the counts slot).