Open ekg opened 5 years ago
It's possible to build the GBWT from alignments (which represent paths through the graph). Here's a simple example of how to do so from the automated tests:
cd vg/test
vg construct -r small/x.fa -v small/x.vcf.gz > x.vg
vg index -x x.xg x.vg
vg sim -n 100 -l 100 -x x.xg -a >sim.gam
vg index -G x_gam.gbwt -M sim.gam -x x_gam.xg x.vg
At this point we have a GBWT that includes the simulated reads. To build custom GBWTs, we only need to write the desired haplotypes in GAM format, which avoids possibly complicated conversion of the haplotypes to VCF.
We can visualize the haplotypes using two methods. In both cases, we need to convert the "threads" in the GBWT to "paths" in the VG graph. We do this with vg paths:
(cat x.vg ; vg paths -g x_gam.gbwt -x x_gam.xg -T -V ) >x+.vg
# method one, graphviz
vg view -dp x+.vg | dot -Tpdf -o x+.pdf
# method two, native linear visualization
vg index -x x+.xg x+.vg
vg viz -x x+.vg -o x+.svg
This is a thread to discuss how to build genome graphs with haplotypes that preserve the privacy of the genomes that the graphs were built from.
As a test data set, I downloaded a piece of chr20 for the 1000GP:
And I made a mini-reference file for the same region:
Now we need to build the graph and haplotype indexes for this subset, following @jltsiren 's documentation https://github.com/vgteam/vg/wiki/Index-Construction