10x Barcodes - Githubissues

Simon-Coetzee commented 7 years ago

10x Barcodes with v2 chemistry work like this: examples from merge_barcodefiles_10x() looks like there may be some confusion about which file does what?

I1 = sample barcode (SB) (8 bp)

@ST-K00126:307:HFM3NBBXX:1:1101:3772:1244 1:N:0:NTCGCCCT
NTCGCCCT
+
#AAAFJ-J

python regex: (@.*)\n(?P<SB>.*)\n+(.*)\n(.*)\n

R1 = cellular barcode (CB) (16 bp) + molecular barcode (MB) (umi) 10bp

@ST-K00126:307:HFM3NBBXX:1:1101:3772:1244 1:N:0:NTCGCCCT
NCATTTGAGTAACCCTGATGTCATAA
+
#AAFFJJJJJJJJJJJJJJJFJJJJJ

python regex: (@.*)\n(?P<CB>.{16})(?P<MB>.{10})\n+(.*)\n(.*)\n

R2 = rna reads (98 bp)

@ST-K00126:307:HFM3NBBXX:1:1101:3772:1244 1:N:0:NTCGCCCT
NCATTTGAGTAACCCTGATGTCATAA
+
#AAFFJJJJJJJJJJJJJJJFJJJJJ

python regex: (?P<name>@.*) .*\n(?P<seq>.*)\n+(.*)\n(?P<qual>.*)\n

Simon-Coetzee commented 7 years ago

@ST-K00126:307:HFM3NBBXX:1:1101:3772:1244 2:N:0:NTCGCCCT
NAAGCCAGTTGTGAATCATGCACATCAGCTCCTTCTGAAATGTGTTTATGGCCTAGGACACAGGGACCCTGGAGACTATGGTGCTGCAGTGCATTATG
+
#<<A<FJJJFJFJJJJJJJJJFJFJJJJJJJJJJJJJJJFJJFFJJJJJAFJJFJF7JJJJFJAJJJ<J<7-A<FFFFJ-F<FJJJJJJJJJJ7FJJA

is what i meant for R2

jnotwell commented 6 years ago

@Simon-Coetzee, this is a correct description of the 10X V2 chemistry. I believe concatenating the sample and cellular barcodes, however, is incorrect (merge_barcodefiles_10x(), args['barcode_start'] = 0, args['barcode_end'] = 26).

This is because 10X uses four 8 bp oligonucleotides per sample index to address sequencing biases. This can be easily observed with any sample barcode file:

zcat SAMPLE_I1_001.fastq.gz | awk '{if(NR % 4 == 2) {a[$1] += 1}} END {for(x in a) {print x "\t" a[x]}}' | sort -k2,2gr

Concatenating the sample and cellular barcodes will (I think) result in reads for a given cell being associated with four different barcodes. Using just the 16 bp cellular barcode should avoid these issues.

pachterlab / sircel

10x Barcodes #2