yhwu / idemp

Barcode demultiplex for Illumina I1, R1, R2 fastq.gz files
GNU General Public License v2.0
30 stars 8 forks source link

Unequal size output fastq files #6

Open cheng0712 opened 7 years ago

cheng0712 commented 7 years ago

Hi,

I have some paired end data, and I get the demultiplexed fastq files by the following command: _idemp -b map.txt -I1 lane1_NoIndex_L001_R2_001.fastq.gz -R1 lane1_NoIndex_L001_R1_001.fastq.gz -R2 lane1_NoIndex_L001_R3001.fastq.gz -m 2 -o lane1/

But I am surprised to see some samples have outputs of unequal size fastq files. Are you matching the barcodes with the forward reads and reverse reads separately or simultaneously?

Thanks, Cheng

yhwu commented 7 years ago

Mapping is done only once on the index read, each read in R1 and R2 are assigned to a file according to the mapping. So, I don't know why you got error. Could be a bug. I'd try

idemp -b map.txt -I1 lane1_NoIndex_L001_R2_001.fastq.gz -R1 lane1_NoIndex_L001_R1_001.fastq.gz -m 2 -o lane1/
idemp -b map.txt -I1 lane1_NoIndex_L001_R2_001.fastq.gz -R2 lane1_NoIndex_L001_R3_001.fastq.gz -m 2 -o lane1/

to see whether you could get correct results. It seems you have Hiseq. I haven't tested it on that platform. So, results could be totally wrong.

cheng0712 commented 7 years ago

It is ATAC-seq data. There is no difference between your codes and mine. BTW, what is your definition of mismatch? Partially matched?

yhwu commented 7 years ago

It's hard to know what went wrong without a piece of you data to reproduce the error. Mismatch includes base pair mismatch, deletion, insertion.

J-Sabino commented 5 years ago

Hi, I have Hiseq data and I am having the same problem. Have you already tested idemp with Hiseq data? Are you planning to test it?

yhwu commented 5 years ago

MiSeq only. It won't work on Hiseq.