I'm trying to demultiplex a series of pools (9-11 donors) with known genotypes based on WES and WGS data. Some of the pools work really well, while others struggle a bit. An issue that I see, the doublet rate in some of the pools is relatively high (quite obvious cell blobs in the middle of a UMAP), this seems to "confuse" souporcell during the clustering, as some SNP clusters are only assigned to these blobs.
Since these are quite obvious doublets, I would try and remove them before running souporcell.
Hence, is there a way to "subset" the number of barcodes souporcell runs on? As, does the barcodes.tsv file provided already subset the number of barcodes processed or would I have to manually subset the bam file before providing it to souporcell.
Hi,
I'm trying to demultiplex a series of pools (9-11 donors) with known genotypes based on WES and WGS data. Some of the pools work really well, while others struggle a bit. An issue that I see, the doublet rate in some of the pools is relatively high (quite obvious cell blobs in the middle of a UMAP), this seems to "confuse" souporcell during the clustering, as some SNP clusters are only assigned to these blobs.
Since these are quite obvious doublets, I would try and remove them before running souporcell. Hence, is there a way to "subset" the number of barcodes souporcell runs on? As, does the
barcodes.tsv
file provided already subset the number of barcodes processed or would I have to manually subset the bam file before providing it to souporcell.Cheers and many thanks, M