Closed jeremymsimon closed 4 years ago
Hi Jeremy,
Thanks for your message. I think there may be some confusion here due the inconsistent order of the barcodes in the read as you point out (UMI-BC3-BC2-BC1) and in the demultiplexed intermediate-filenames (BC1-BC2-BC3) generated after step1. Because the 8bp barcodes we are using have overhangs BC1 (AGCATTCGNNNNNNNN) BC2 (NNNNNNNNATCCA) BC3 (NNNNNNNNGTGGCC) it is possible to identify which barcodes are which at all stages of demultiplexing.
The collapse script collapses two intermediate files based on the identification of a BC1-RandomHex and BC1-OligoDT pair. If you open up the RanHex.txt and OligoDT.txt files you will notice that they contain only barcodes with the BC1 overhang format (AGCATTCGNNNNNNNN) and files are collapsed based on their round 1 barcode sequence only.
I hope that makes sense. Let me know if further clarification would be helpful!
Ah, okay- so the intermediate FASTQs, and thus the cell IDs, are listed in the reverse order, and thus they are in the same round1-round2-round3 order. Is that correct?
Yes exactly
Okay thanks for that clarification!
If the UMI is bases 1-10 of the barcode read, then according to this schematic, the barcode corresponding to oligo-dT/random hexamers should be the round1 barcode, but should be the third barcode sequenced, since sequencing proceeds outside-in (ie UMI-BC3-BC2-BC1). If I'm understanding the collapse script (both versions) correctly, it looks like we're collapsing based on the first barcode rather than the third.
Maybe we're misunderstanding something about the amplification or direction of sequencing, or perhaps the demultiplexing python script is (correctly) reading the barcodes from right to left...? Anyway can you please confirm/clarify this?
Thanks!