Closed Pentayouth closed 2 years ago
I can confirm this is a bug causing extra characters to be added to the read group entries (not cell barcode tag). A workaround is to use only one core -p 1
Recent versions of samtools support extracting all reads with have a certain list of barcodes.
samtools view -@ 4 -b -o subset.bam -D XC:subset_barcodes.txt input.bam
With subset_barcodes.txt
ACCGTCAGCGAT
GTTCAGAATAGC
GCAACACGAGTG
GCTTCACCCTTA
TCGATCCACGAG
CACGCCAATTAG
CGACCGGGAAAA
CAAGCATATGCA
CTCATGTTGTAG
TCCTCCGACCCA
...
Thank you so much 👍
This should now be fixed in the latest release
similar but different from issue #15 this is my original single cell bam file
which has 7 RG from CH4-LN_2_L001 to CH4-LN_2_L007
I want to extract reads tagged by 100 cell barcodes, and here is my code:
sinto filterbarcodes -b star_gene_exon_tagged.bam -c xcForTest.txt --barcodetag XC -p 16
xcForTest.txt has 100 cell barcode,like this:
when i was checking the output bam, which is subset.bam, using code:
samtools view -h ./subset.bam | grep "CH4-LN_2_L007-" | less
I found some unexpected records like:'-364D2CCB' were added to the cell barcode tag, which influence my downstream process. Is it a bug?