vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

The coverage information of "vg pack" is all 0. #4052

Closed tanger-code closed 1 year ago

tanger-code commented 1 year ago

PLEASE DO NOT MAKE SUPPORT REQUESTS HERE

Please the Biostars forum instead:

https://www.biostars.org/new/post/?tag_val=vg

I use GraphAligner to align "chr21 reads" to the "hprc-chr21.gfa": GraphAligner -g chr21.gfa -f chr21.reads -t 88 -a graphaligner_aln.gam -x vg. Then I use "vg pack" to get the coverage information about each node: vg pack -x chr21.gfa -g graphaligner_aln.gam -t 88 -d > aln.pack2.txt. But I find the coverage is all zero. 1691936745146 Why is that?How do I get the correct coverage information?

glennhickey commented 1 year ago

Did you use -Q with vg pack? If so, you might want to try again without this option.

tanger-code commented 1 year ago

No, I only use -d to write the result to .txt file.

glennhickey commented 1 year ago

You need to write the pack file with -o.

vg pack -x chr21.gfa -g graphaligner_aln.gam -t 88 -o graphaligner_aln.pack

Then you can read it into a table with -i -d

vg pack -x chr21.gfa -g graphaligner_aln.gam -t 88 -i graphaligner_aln.pack -d
tanger-code commented 1 year ago

I've tried, but the result is the same.

tanger-code commented 1 year ago

The graph used is from: https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.1-mc-grch38/hprc-v1.1-mc-grch38.chroms/chr21.vg. Could you try it? Using GraphAligner to map long reads from anywhere, then use vg pack to get the coverage information.

glennhickey commented 1 year ago

I'll try it if you send me a few read mappings from your gam. You can do something like

vg view -a GAM-FILE | head -10

and paste the results here.

tanger-code commented 1 year ago

Because I used the long reads, it's a little bit too much. Here I put them in a file. head10.txt

glennhickey commented 1 year ago

Seems like there's some kind of name / node_id mixup in the GAM. You see it right away with

vg view -JaG head10.txt > head10.gam
vg validate chr21.vg -a head10.gam

-> error after error like
Node 677029 not found in graph

I'd think vg pack would give the same sort of error, but I guess it doesn't check (probably something to fix).

But if you look at the json, you see stuff like

 "position": {"is_reverse": true, "name": "43503065", "node_id": "676997", "offset": "56"}}

and while 676997 is not a node in the graph, 43503065 is a node in the graph.

I know we have some code in vg that can associate node_id's with names from original GFA files, but I'm not sure how that could get carried over to GraphAligner. What input exactly were you giving GraphAligner?

tanger-code commented 1 year ago

When I use .vg graph in GraphAligner's mapping command, there is something error: image So I convert .vg graph to .gfa graph using vg view. Then mapping reads to the GFA graph using GraphAligner: graphGFA="/data/home/tangen/data/HPRC/grch38_based/chromosome/chr21/chr21.gfa" reads="/data/home/tangen/data/HPRC/grch38_based/chromosome/chr21/chr21-reads/chr21.fq" $GraphAligner -g $graphGFA -f $reads -t 88 -a graphaligner_aln.gam -x vg

glennhickey commented 1 year ago

Going back to this bit from your GAM

 "position": {"is_reverse": true, "name": "43503065", "node_id": "676997", "offset": "56"}}

If I do vg view chr21.vg, there is no node 676997 in that graph. I don't understand why GraphAligner is setting the node_id field to that (and putting what seems like a legit node id in the name field). So as far as I can tell this seems like an issue with GraphAligner and not vg, that you should take up on GraphAligner's github instead.

One other thing you might try is switching to GAF format output with GraphAligner and running vg pack on that instead (using -a instead of -g, with gzipped gaf supported too).

tanger-code commented 1 year ago

I have tried to use .gaf format, and it works. Thank you for your advice.