jfjlaros / demultiplex

Versatile FASTA/FASTQ demultiplexer.
MIT License
32 stars 5 forks source link

dual indexes. #1

Closed raw937 closed 6 years ago

raw937 commented 6 years ago

Hello,

For example: In bold dual barcode

R1 read

@SOLEXA1_0069_FC:3:1:1673:948#ACAGTG/1 GACTAACCGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAATGTTAGCCGTCGGGCAGTATACTGTTCGG + BMMQNTWSWWb_b_bb__Y_____YYYYY[[[Y[__XXRWXVVVVTYYYYYT

R2 read

@SOLEXA1_0069_FC:3:1:1673:948#ACAGTG/2 CTGAAGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAGCCATGCAGCACCTGT + ghgaggfghhhhhhhhhhghhhhhhhhhhfhhhghfWffch[hhgahhedffddR[^W\^Zc^\_cac[Wb]^W^

The barcodes are here (16 possible): TCAGTCAG CTGACTGA TCAGGACT GACTGACT AGTCAGTC GACTTCAG GACTAGTC GACTCTGA TCAGCTGA AGTCTCAG AGTCGACT CTGAAGTC CTGAGACT AGTCCTGA TCAGAGTC CTGATCAG

jfjlaros commented 6 years ago

What is the actual question?

raw937 commented 6 years ago

Hello, I need it if possible to use dual indexes.

For example: In bold dual barcode

R1 read

@SOLEXA1_0069_FC:3:1:1673:948#ACAGTG/1 GACTAACCGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAATGTTAGCCGTCGGGCAGTATACTGTTCGG + BMMQNTWSWWb_b_bb__Y_____YYYYY[[[Y[__XXRWXVVVVTYYYYYT

R2 read

@SOLEXA1_0069_FC:3:1:1673:948#ACAGTG/2 CTGAAGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAGCCATGCAGCACCTGT + ghgaggfghhhhhhhhhhghhhhhhhhhhfhhhghfWffch[hhgahhedffddR[^W^Zc^_cac[Wb]^W^

Here are 16 possible in the file I am working on. TCAG-TCAG CTGA-CTGA TCAG-GACT GACT-GACT AGTC-AGTC GACT-TCAG GACT-AGTC GACT-CTGA TCAG-CTGA AGTC-TCAG AGTC-GACT CTGA-AGTC CTGA-GACT AGTC-CTGA TCAG-AGTC CTGA-TCAG

The first four nts are the barcode like our example before would be: GACT-CTGA_R1.fq GACT-CTGA_R2.fq

But you would need both reads to tell you that it's GACT-CTGA and not something else. What would the command look like for this? Does this demux script do the dual barcoding?

jfjlaros commented 6 years ago

Not directly. Whenever I need dual or triple barcoding, I simply use this program multiple times.

raw937 commented 6 years ago

How would that work? I feel I don't have the right barcode file. Does it work only on python 3?

jfjlaros commented 6 years ago

Suppose we have two files containing barcodes that are used for dual indexing:

A.txt for barcode 1 and 2. B.txt for barcode 3 and 4.

Furthermore, suppose that the first barcode can be found in the header of read 1 and the second one in the header of read 2.

We can then demultiplex in two steps:

demultiplex demux A.txt first_read.fq second_read.fq

This will result in two new pairs of files:

We can now demultiplex each of these pairs as follows:

demultiplex demux B.txt second_read_1.fq first_read_1.fq
demultiplex demux B.txt second_read_2.fq first_read_2.fq

Which will result in the final list of pairs:

jfjlaros commented 6 years ago

See the -r, -e and -s options for barcodes that are located in the reads instead of the header.

Indeed, this package only works with Python 3 as of today (some dependencies did not support Python 2 anymore).