Closed tanger-code closed 1 year ago
Did you use -Q
with vg pack
? If so, you might want to try again without this option.
No, I only use -d
to write the result to .txt file.
You need to write the pack file with -o
.
vg pack -x chr21.gfa -g graphaligner_aln.gam -t 88 -o graphaligner_aln.pack
Then you can read it into a table with -i -d
vg pack -x chr21.gfa -g graphaligner_aln.gam -t 88 -i graphaligner_aln.pack -d
I've tried, but the result is the same.
The graph used is from: https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.1-mc-grch38/hprc-v1.1-mc-grch38.chroms/chr21.vg. Could you try it? Using GraphAligner to map long reads from anywhere, then use vg pack to get the coverage information.
I'll try it if you send me a few read mappings from your gam. You can do something like
vg view -a GAM-FILE | head -10
and paste the results here.
Because I used the long reads, it's a little bit too much. Here I put them in a file. head10.txt
Seems like there's some kind of name / node_id mixup in the GAM. You see it right away with
vg view -JaG head10.txt > head10.gam
vg validate chr21.vg -a head10.gam
-> error after error like
Node 677029 not found in graph
I'd think vg pack
would give the same sort of error, but I guess it doesn't check (probably something to fix).
But if you look at the json, you see stuff like
"position": {"is_reverse": true, "name": "43503065", "node_id": "676997", "offset": "56"}}
and while 676997
is not a node in the graph, 43503065
is a node in the graph.
I know we have some code in vg
that can associate node_id's
with names from original GFA files, but I'm not sure how that could get carried over to GraphAligner. What input exactly were you giving GraphAligner?
When I use .vg
graph in GraphAligner's mapping command, there is something error:
So I convert .vg
graph to .gfa
graph using vg view
. Then mapping reads to the GFA graph using GraphAligner:
graphGFA="/data/home/tangen/data/HPRC/grch38_based/chromosome/chr21/chr21.gfa" reads="/data/home/tangen/data/HPRC/grch38_based/chromosome/chr21/chr21-reads/chr21.fq"
$GraphAligner -g $graphGFA -f $reads -t 88 -a graphaligner_aln.gam -x vg
Going back to this bit from your GAM
"position": {"is_reverse": true, "name": "43503065", "node_id": "676997", "offset": "56"}}
If I do vg view chr21.vg
, there is no node 676997
in that graph. I don't understand why GraphAligner
is setting the node_id
field to that (and putting what seems like a legit node id in the name
field). So as far as I can tell this seems like an issue with GraphAligner and not vg, that you should take up on GraphAligner's github instead.
One other thing you might try is switching to GAF format output with GraphAligner
and running vg pack
on that instead (using -a
instead of -g
, with gzipped gaf supported too).
I have tried to use .gaf
format, and it works. Thank you for your advice.
PLEASE DO NOT MAKE SUPPORT REQUESTS HERE
Please the Biostars forum instead:
https://www.biostars.org/new/post/?tag_val=vg
I use GraphAligner to align "chr21 reads" to the "hprc-chr21.gfa": GraphAligner -g chr21.gfa -f chr21.reads -t 88 -a graphaligner_aln.gam -x vg. Then I use "vg pack" to get the coverage information about each node: vg pack -x chr21.gfa -g graphaligner_aln.gam -t 88 -d > aln.pack2.txt. But I find the coverage is all zero. Why is that?How do I get the correct coverage information?