tecangenomics / nudup

NuDup -- Marks/removes duplicate molecules based on the molecular tagging technology used in Tecan products.
http://www.tecangenomics.com
GNU Lesser General Public License v3.0
14 stars 9 forks source link

SAM Error #6

Closed bimbam23 closed 7 years ago

bimbam23 commented 7 years ago

Hi I'm using nudup.py and I#m not sure if this is a bug or if I'm using the program wrong: ~/bins/nudup/nudup.py -f p_S_14_R2.fastq -o output Galaxy263-HISAT2_on_data_191.bam 2016-12-20 14:38:06,726 [ INFO] - Deduplicating NuGEN single end reads... 2016-12-20 14:38:12,280 [ INFO] - Using molecular tag sequence from Index FASTQ read 2016-12-20 14:38:12,280 [ INFO] - Appending molecular tag sequence to SAM/BAM read name

2016-12-20 14:49:05,807 [ ERROR] - SAM read names did not match read names in Index D00418:106:CA65BANXX:7:1101:10000:26923 16 chr1 199694595 1 121M 0 0 CGAACTTAGTGCGGACACCCGATCGGCATAGCGCACTGCAGCCCAGAACTCCTGGGCTCAAGCGATCCTCCAGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCGCGCGCCACCGCGCCC GGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG AS:i:0 ZS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:121 YT:Z:UU NH:i:2 D00418:106:CA65BANXX:7:1101:10000:26923 256 chr1 200101917 1 121M 0 0 GGGCGCGGTGGCGCGCGCCTGTAGTCCCAGCTACTCGGGAGGCTGAGGCTGGAGGATCGCTTGAGCCCAGGAGTTCTGGGCTGCAGTGCGCTATGCCGATCGGGTGTCCGCACTAAGTTCG GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCGG AS:i:0 ZS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:121 YT:Z:UU NH:i:2 D00418:106:CA65BANXX:7:1101:10001:36341 16 chr1 126986022 255 59M * 0 0 ACAAGTACAAATTTTCTTCTCAGATAAAATCTTCTCAAAATATTTTTTGAAAAAAAATC GGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDCGGGGGGGGGGG AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:28C30 Y 2016-12-20 14:49:06,662 [ ERROR] - SAM and Index do not match check failed

Cheers Jochen

shuelga commented 7 years ago

This is not a bug. Please ensure that the names/headers of your FASTQ reads match exactly the read names in your bam otherwise the tool cannot match the index to the mapped read. ie: you should be able to grep "D00418:106:CA65BANXX:7:1101:10000:26923" from your FASTQ file exactly.