seqan / slimm

Species Level Identification of Microbes from Metagenomes
Other
27 stars 3 forks source link

Problem with bowtie2 example #9

Closed mikesioda closed 7 years ago

mikesioda commented 7 years ago

In the bowtie2 example, the bowtie2 -k 60 switch causes a corrupt sam files to be created - it took me a long time to track this problem down. If you are able to see the same problem with the software versions I have listed below, you may want to remove this from your example.

Do you have a recommendation for how to update bowtie2 command in the absence of the -k 60 switch? Do you recommend any change since I need to run without this included?

I am following instructions posted: https://github.com/seqan/slimm/wiki/Example-Bowtie2 using the already indexed reference genomes: http://ftp.mi.fu-berlin.de/pub/dadi/slimm/

I have installed SLIMM and the AB_5K & AB_13K index edon my Macbook Pro (OS X 10.11.6) and on my university's Redhat Linux HPC (High Performance Computing) cluster. Both environments have the same problem creating the alignment bam file with either index. The problem is the sam file piped into samtools view is corrupt due to the bowtie2 -k 60 switch. It is not clear why this error occurs but it happens with both 5K & 13K precompiled index.

Samtools view error

(also seen if bowtie2 creates a sam file with -S and passes it to SLIMM) Samtools [W::sam_read1] parse error at line 4548 [main_samview] truncated file.

If I use bowtie2 -S to output a sam file, the same file will fail validation when checked with the Picard Tools ValidateSamFile:

Picard Tools ValidateSamFile htsjdk.samtools.SAMFormatException

Caused by: htsjdk.samtools.SAMFormatException: Error parsing text SAM file. Not enough fields; File test.sam; Line 13197

command line input

bowtie2 -x /Users/msioda/slimm/AB_5K_indexed_ref_genomes_bowtie2/AB_5K \ -U /Applications/bowtie2-2.3.1/example/reads/longreads.fq \ -q --no-unal --very-fast --mm -k 60 -p 4 \ 2> /Users/msioda/slimm/test/lambdaErrors2.txt | samtools view -bS ->
/Users/msioda/slimm/test/lambda2.bam

This statement works fine if the -k 60 is removed. Do you know why this happens?

Software versions:

bowtie2/2.3.1
samtools/1.3.1

Also, bowtie v 2.2.0 or above is required to handle .bt2l large index files. You may want to list this as a dependancy.

temehi commented 7 years ago

Hey,

Thanks for reporting this issue. I don't really know why this happens. Maybe related to this issue of bowtie2. Could you please attach the SAM file you produced using bowtie2 with -S? If the SAM file is too big the first 1M lines will be enough.

mikesioda commented 7 years ago

Thank you for your help, it does look like it is the very same issue you link to above. Note, I have verified the same error occurs on the latest version of samtools/1.4. I ran a few more tests and see that SLIMM also works fine if -k 1 is passed to Bowtie2. If -k 2 or higher is passed, the sam file output includes multiple alignments followed by the FASTQ sequence header of the aligned read. The error points to the header row of the first multiple alignment in test_k2_sam.txt that starts with @77.TS5_2 E91NIX001BY1NI.

So the error appears to be caused by the existence of the multiple alignment header rows, which are output anytime multiple alignments are found, which requires -k 2 or higher to be passed to Bowtie2.

Do you think my analysis will suffer much if I skip the -k Bowtie2 param and just use the default option (Bowtie2 looks for multiple alignments, reports best, with MAPQ)?

If I pass test_k2_sam.txt to SLIMM, I get this error: `/home/mi/dadi/workspace/development/builds/slimm-src/include/seqan/include/seqan/basic/basic_exception.h:363 FAILED! (Uncaught exception of type seqan::ParseError: Unexpected SAM header encountered.)

stack trace: 0 [0x40147f] 1 [0x4245f8] 2 [0x46f486] 3 [0x46f4d1] 4 [0x46f198] 5 [0x42bffa] 6 [0x4057a0] 7 [0x50f0f0] 8 [0x408cb7] `

If I pass test_k2_sam.txt to picardTools ValidateSamFile, I get this error: SAMFormatException on record 21 [Thu Apr 20 11:58:12 EDT 2017] picard.sam.ValidateSamFile done. Elapsed time: 0.00 minutes. Runtime.totalMemory()=1007157248 To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp Exception in thread "main" htsjdk.samtools.SAMException: SAMFormatException on record 21 at htsjdk.samtools.SamFileValidator.validateSamRecordsAndQualityFormat(SamFileValidator.java:340) at htsjdk.samtools.SamFileValidator.validateSamFile(SamFileValidator.java:215) at htsjdk.samtools.SamFileValidator.validateSamFileSummary(SamFileValidator.java:143) at picard.sam.ValidateSamFile.doWork(ValidateSamFile.java:196) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:205) at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:94) at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:104) Caused by: htsjdk.samtools.SAMFormatException: Error parsing text SAM file. Not enough fields; File bt_k2_test.sam; Line 4543 Line: @77.TS5_2 E91NIX001BY1NI orig_bc=CGAGAGTTACGC new_bc=CGAGAGTTACGC bc_diffs=0%0ATTGGGCCGTGTCTCAGTCCCAATGTGGCCGGTCACCCTCTCAGGTCGGCTATGGATCGTCGCCTTGGTGGGCCGTTACCTCACCAACTAGCTAATCAGACGCGAGTCCATCTCATACCACCGGAGTTTTTCACACTGAGCCATGCAGCTCTGTGCGCTTATGCGGTATTAGCAGCCGTTTCCAGCTGTTATCCCCCGGCATGGGGNAGGTCACCCACG%0A+%0A</<<0>:=====<<<==<<0=9=<<=:>;>:=<=<<0<=<<<=:==>:<<=<>;;=<<===>:>;>;<>=0>;;>;;>;;<;>;>:<<<<<<>;;<<;;<<<<<<:>;<<<:<:<;=:<>;=;<;;;70':;9:;:<9<>::<<<<8:<9:;<:;:<=8<8<:>;<<>;97;:<=;5>=2:8<98;:<:<7;;70'>;2<<=<7+!<>;8:<>=0<06%0A at htsjdk.samtools.SAMLineParser.reportFatalErrorParsingLine(SAMLineParser.java:446) at htsjdk.samtools.SAMLineParser.parseLine(SAMLineParser.java:231) at htsjdk.samtools.SAMTextReader$RecordIterator.parseLine(SAMTextReader.java:248) at htsjdk.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:236) at htsjdk.samtools.SAMTextReader$RecordIterator.next(SAMTextReader.java:212) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:569) at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:543) at htsjdk.samtools.SamFileValidator.validateSamRecordsAndQualityFormat(SamFileValidator.java:284) ... 6 more

I have attached an example fastQ & two sam files with -k 1 (successful) & -k 2 (failed) passed as params so you can compare the sam files in successful vs failed tests. Both were generated by Bowtie2 using this same fastQ file.

File extensions changed to .txt so this form will accept them for upload.

test_k1_sam.txt test_k2_sam.txt test.fq.txt

temehi commented 7 years ago

Thanks again for spotting this problem. I have updated the tutorial accordingly.

The error is definitely related to the issue I have linked above. And since this issue is not related to SLIMM I will close it.

The latest version of bowtie2 (v2.3.1) produces a SAM file that is invalid when reporting secondary alignments (k >= 2). Specifically, there are fastQ like records that start with @ in the non-header part of the SAM file. A line that starts with @ is a header record for sam files and all the header records should be at the beginning. Which makes the output SAM files, including your example test_k2_sam.txt, invalid.

In your example file, the sam header records end at line 4540 then you have 2 alignment records. But at line 4543 you have a line that starts like @77.TS31_18 E91NI ... which invalidates the SAM file.

Do you think my analysis will suffer much if I skip the -k Bowtie2 param and just use the default option (Bowtie2 looks for multiple alignments, reports best, with MAPQ)?

Yes! If you turn off secondary alignments Bowtie2 will choose one alignment out of the best alignments randomly. And SLIMM's job includes identifying which alignments are the more likely ones. And an alignment with lower quality might become the more likely one.

I recommend using older versions bowtie2. (I used v2.2.9 without a problem) and the guy who reported the issue said v2.3.0 also have the expected output.