vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.1k stars 194 forks source link

How to tanslate the Minigraph-cactus result to SequenceTubeMaps through vg #4213

Closed ld9866 closed 7 months ago

ld9866 commented 8 months ago

Dear developer: We built the graphical pan-genome using Minigraph-cactus, and then we wanted to convert the format through vg to enable subsequent visual analysis using SequenceTubeMaps. Here, we used the "vg construct -r reference.fa -v chr2.vcf.gz (from minigraph cactus) > chr2.vg get the vg file. But when we want to use the sequenceTubeMap/script/prepare_vg.sh we will encounter warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for bd73c23e6ebc2deb2c1d5e5be95374f2b2ce5367 at chr2:9 missing/empty! Was the variant skipped during construction? We did not know how to solve the problem and we also used the example in vg folder, we met the error in the step "vg index x.vg -v x.vcf.gz -x x.vg.xg --gbwt-name x.gbwt" which are same with the Minigraph-cactus result. Error: warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for bd73c23e6ebc2deb2c1d5e5be95374f2b2ce5367 at x:9 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 5b70e6015d6cc1755fe821abb642a7ac72055833 at x:10 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 63b24cce9605adfadaa2d1168646b3ac722d2833 at x:14 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 58140fbc706f4076680095c8dcf6cbe1e5d80509 at x:34 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 8c38474f679b54ca250ffd521a9844eadb09642e at x:39 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 55df8b817a480aa353fbcccdd4207cc45483bc63 at x:52 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 15cff0c46efeb562a3125ed4186e5249820cf62b at x:58 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for dabba4a16c6a43642e90ae10769b046a9c4aa4eb at x:100 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for a94cd8cc4e3b353aaf29161418962f96c40399de at x:103 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] alt and ref paths for 5899689e351a42b8d9fff229b1e71ed199296de3 at x:122 missing/empty! Was the variant skipped during construction? warning: [HaplotypeIndexer::parse_vcf] suppressing further missing variant warnings warning: [HaplotypeIndexer::parse_vcf] Found 75/0 variants in phasing VCF but not in graph! Do your graph and VCF match?

ld9866 commented 8 months ago

By the way, we can successfully run SequenceTubeMaps but we're not visualizing and we can't see the variation of other individuals

jltsiren commented 7 months ago

The prepare_vg.sh script in sequenceTubeMap is very old and uses obsolete commands. If you want to use a Minigraph-Cactus graph with sequenceTubeMap, you can replace it with

vg convert -x graph.gfa > graph.xg

The above assumes that the graph you got from Minigraph-Cactus is in GFA format. You can also tell Minigraph-Cactus to create the XG graph directly.

VCF is a lossy and ambiguous format, and converting a graph to VCF and back may not produce anything sensible. If you want to use that approach anyway, you need to add option -a to the vg construct command to include the variants from the VCF file as paths in the VG graph.