Closed robegan21 closed 7 years ago
Here is the sam file that was produced: http://portal.nersc.gov/dna/RD/Adv-Seq/ONT/all.sam
If you'd like to try to reproduce, here are my input files and the command line:
http://portal.nersc.gov/dna/RD/Adv-Seq/ONT/Arabidop-ref.fa http://portal.nersc.gov/dna/RD/Adv-Seq/ONT/Arabidop-ref.fa.gmidx http://portal.nersc.gov/dna/RD/Adv-Seq/ONT/all.fastq
graphmap align --ref Arabidop-ref.fa --index Arabidop-ref.fa.gmidx --reads all.fastq
I am also seeing negative starting positions for some of my reads. I am trying to map cDNA data and am not using circular mapping so no idea how the negative values should be interpreted? At the moment, I am ignoring them as that's what samtools does.
Hi Rob, Laura!
Thank you for reporting this! This was an unfortunate bug introduced in the last release. The negative values were a bug created by a wrong testing of the bounds of the reference coordinates. It should be fixed now, I tested on the data above and the coordinates look fine. I also generated several synthetic tests locally which confirm this. The graphmap code is in dire need of thorough refactoring, which is going incredibly slowly due to other obligations, unfortunately.
I will mark this issue as closed, but please check for your selves and let me know if there is a problem by reopening the issue!
Thanks again, Best regards, Ivan.
Hi,
In trying to validate if issue #43 is fixed, I tried the latest version: v0.3.0-128-g25f1eeb Then when converting to bam via samtools I discovered that many fields are incorrectly formatted, negative positions, invalid and non-ascii chars in the aux fields, etc
samtools view -Sbu all.sam | samtools sort -@ 8 - all [bam_header_read] EOF marker is absent. The input is probably truncated. [samopen] SAM header is present: 7 sequences. Parse error at line 83566: missing colon in auxiliary data