isovic / graphmap

GraphMap - A highly sensitive and accurate mapper for long, error-prone reads http://www.nature.com/ncomms/2016/160415/ncomms11307/full/ncomms11307.html Note: This was the original repository which will no longer be officially maintained. Please use the new official repository here:
https://github.com/lbcb-sci/graphmap2
MIT License
178 stars 44 forks source link

invalid sam fields #60

Closed robegan21 closed 7 years ago

robegan21 commented 7 years ago

Hi,

In trying to validate if issue #43 is fixed, I tried the latest version: v0.3.0-128-g25f1eeb Then when converting to bam via samtools I discovered that many fields are incorrectly formatted, negative positions, invalid and non-ascii chars in the aux fields, etc

samtools view -Sbu all.sam | samtools sort -@ 8 - all [bam_header_read] EOF marker is absent. The input is probably truncated. [samopen] SAM header is present: 7 sequences. Parse error at line 83566: missing colon in auxiliary data


b6ad69e3-437c-45c8-a8e8-ec5845e04668_Basecall_Alignment_template        0       Arabidop_Ch2    -8002   40     3S3M1I2M2I10M1I17....M2I13M1I1M

And sometime aux fields like this:
NM:i:-1 AS:i:-262       H0:i:0  ZE:f:inf     
robegan21 commented 7 years ago

Here is the sam file that was produced: http://portal.nersc.gov/dna/RD/Adv-Seq/ONT/all.sam

If you'd like to try to reproduce, here are my input files and the command line:

http://portal.nersc.gov/dna/RD/Adv-Seq/ONT/Arabidop-ref.fa http://portal.nersc.gov/dna/RD/Adv-Seq/ONT/Arabidop-ref.fa.gmidx http://portal.nersc.gov/dna/RD/Adv-Seq/ONT/all.fastq

graphmap align --ref Arabidop-ref.fa --index Arabidop-ref.fa.gmidx --reads all.fastq

laura-oikkonen commented 7 years ago

I am also seeing negative starting positions for some of my reads. I am trying to map cDNA data and am not using circular mapping so no idea how the negative values should be interpreted? At the moment, I am ignoring them as that's what samtools does.

isovic commented 7 years ago

Hi Rob, Laura!

Thank you for reporting this! This was an unfortunate bug introduced in the last release. The negative values were a bug created by a wrong testing of the bounds of the reference coordinates. It should be fixed now, I tested on the data above and the coordinates look fine. I also generated several synthetic tests locally which confirm this. The graphmap code is in dire need of thorough refactoring, which is going incredibly slowly due to other obligations, unfortunately.

I will mark this issue as closed, but please check for your selves and let me know if there is a problem by reopening the issue!

Thanks again, Best regards, Ivan.