vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

inject crashes on RNAME=="*" #3955

Closed esrice closed 1 year ago

esrice commented 1 year ago

1. What were you trying to do? Inject a sam file into gam format. The sam file is the output of minimap2 mapping against the output of vg paths -F so the paths in the sam file ought to be exactly the same as the paths in the graph.

2. What did you want to happen? For the inject command to output the alignments in gam format.

3. What actually happened? Inject crashed with the error

[vg::alignment.cpp] error: alignment references path not present in graph: 

(no text after the ":" giving the path name)

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

N/A

5. What data and command can the vg dev team use to make the problem happen?

I made two toy example sam files from the actual sam file I was trying to run through. Each of them had a single non-header line from the original sam file. One of them was a mapped read (i.e., RNAME != "") and the other was an unmapped read (i.e. RNAME == ""). The sam file containing a single mapped read successfully converted to gam but the sam file containing a single unmapped read gave the same error as running on the whole file.

Therefore, it appears that the problem is that vg inject doesn't know how to handle unmapped reads, at least in sam format.

To be more exact, here are the commands I used:

$ vg paths -F -Q '[genomeName]#0#' -x pangenome.gbz > [genomeName].paths.fa
$ minimap2 -ax sr [genomeName].paths.fa reads.fastq.gz > aligned.sam
$ vg inject -x pangenome.gbz aligned.sam > aligned.gam
[vg::alignment.cpp] error: alignment references path not present in graph: 
$ samtools view -H aligned.sam > header.sam
$ samtools view aligned.sam | head -n 1 > one_read_mapped.sam
$ awk '!/^@/ && $3=="*"' aligned.sam | head -n 1 > one_read_unmapped.sam
$ cat header.sam one_read_mapped.sam > test_mapped.sam
$ cat header.sam one_read_unmapped.sam > test_unmapped.sam
$ vg inject -x pangenome.gbz test_mapped.sam > aligned.gam
[worked]
$ vg inject -x pangenome.gbz test_unmapped.sam > aligned.gam
[vg::alignment.cpp] error: alignment references path not present in graph: 

6. What does running vg version say?

vg version v1.47.0 "Ostuni"
Compiled with g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 on Linux
Linked against libstd++ 20210601
Built by anovak@octagon
jeizenga commented 1 year ago

It was an easy fix. Should be resolved in the master branch now. Thanks for the bug report!