vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

vg haplotypes generate personalized graph with some nodes no id #4016

Closed hahahafeifeifei closed 1 year ago

hahahafeifeifei commented 1 year ago

1. What were you trying to do? I want to test the new released vg haplotypes function.

2. What did you want to happen? I want to do the haplotype sampling using HPRC MC graph and generate the personalized graph.

3. What actually happened? The personalized graph has some nodes with no node id like this.

S               TAACCCTAA
S               T
S               CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA
S               AGG
S               ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTA
S               AC
S               CCTAACCCTAACCCTAACCC
S               A
S               AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC
S               C
S               TAACCCTAACCCTAACCC
S               A
S               A

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

Place stacktrace here.

5. What data and command can the vg dev team use to make the problem happen?

wget https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph-cactus/hprc-v1.0-mc-chm13.gfa.gz
vg gbwt -G hprc-v1.0-mc-chm13.gfa --gbz-format -g hprc-v1.0-mc-chm13.gbz -p
vg index -j hprc-v1.0-mc-chm13.dist hprc-v1.0-mc-chm13.gbz
vg gbwt -r hprc-v1.0-mc-chm13.ri -Z hprc-v1.0-mc-chm13.gbz
vg haplotypes -v 2 -t 16 -H hprc-v1.0-mc-chm13.hapl hprc-v1.0-mc-chm13.gbz
vg haplotypes -v 2 -t 16 --include-reference -i hprc-v1.0-mc-chm13.hapl -k test_sample.kmer.kff -g test_sample.gbz hprc-v1.0-mc-chm13.gbz
vg convert -f test_sample.gbz > test_sample.gfa

6. What does running vg version say?

vg: variation graph tool, version v1.49.0 "Peschici"
jltsiren commented 1 year ago

This was as error in how the translation between GFA segment names and vg node identifiers is stored in the sampled graph. The sampled graph itself is fine, and it can be used with vg tools, but any attempt to translate it (or mappings to it) into the GFA segment space will fail.

I have fixed this in my development branch at https://github.com/jltsiren/vg. The fix will be included in vg 1.50.0, which should be released in early August.

hahahafeifeifei commented 1 year ago

Hi @jltsiren,

Thank you for your prompt response! I have been attempting to compile the source code from your development branch, but I have encountered difficulties. Could you kindly provide me with the binary vg file? Thanks a lot!

jltsiren commented 1 year ago

Unfortunately I'm on vacation now, and I don't have access to any system that can produce portable binaries.