wheaton5 / souporcell

Clustering scRNAseq by genotypes
MIT License
159 stars 45 forks source link

Understanding the barcodes command #215

Open Thapeachydude opened 9 months ago

Thapeachydude commented 9 months ago

Hi,

I'm trying to demultiplex a series of pools (9-11 donors) with known genotypes based on WES and WGS data. Some of the pools work really well, while others struggle a bit. An issue that I see, the doublet rate in some of the pools is relatively high (quite obvious cell blobs in the middle of a UMAP), this seems to "confuse" souporcell during the clustering, as some SNP clusters are only assigned to these blobs.

Since these are quite obvious doublets, I would try and remove them before running souporcell. Hence, is there a way to "subset" the number of barcodes souporcell runs on? As, does the barcodes.tsv file provided already subset the number of barcodes processed or would I have to manually subset the bam file before providing it to souporcell.

Cheers and many thanks, M

wheaton5 commented 9 months ago

Yeah, just make a new barcodes.tsv and it will only run on those.