snewhouse / glu-genetics

Automatically exported from code.google.com/p/glu-genetics
Other
0 stars 1 forks source link

Newbler2SAM sorts the similar reads in the wrong order #11

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Take a 454PairAlign.txt file that contains reads that are mapped to multiple 
locations, and use that to create a sam file with Newbler2SAM (1.0b2-dev):
glu seq.Newbler2SAM --unaligned=drop -o chr3.sam 454PairAlign.txt 
../sff/LOTOFREADS.sff
2. If the read is mapped to multiple locations without any other read in 
between in the 454PairAlign.txt file, it sorts them longest to shortest
3. Some programs like the IGV viewer cannot load the resulting sam file as it 
needs the reads sorted on start position

What is the expected output? What do you see instead?
I expect data sorted on start position, and I see data that is swapped on some 
locations.

What version of the product are you using? On what operating system?
glu-genetics 1.0b2-dev on ubuntu 10.04

Please provide any additional information below.
To fix this is I build in an additional sorting step to get the data back in 
its original order (see patch, I added also some addiotional output to report 
activity when handling large files). However I don't know wheter people prefer 
to have the longest read of a set of subsequent multiple mapped reads in front.

Original issue reported on code.google.com by bratdak...@gmail.com on 11 Oct 2010 at 3:10

Attachments: