vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

vg call as a kind of deconstruction algorithm #2454

Open ekg opened 5 years ago

ekg commented 5 years ago

In CPANG19, many students wanted to project a complex graph (not even alignments to it, but the paths in it) out to a VCF file.

It seems like vg call could be able to do something like this, should we also mange to handle cyclic graphs. The problem looks to be that the "alignments" made by paths through the graph don't have qualities, so are ignored by call. I'm not completely sure though.

glennhickey commented 5 years ago

I spruced up vg deconstruct to do exactly this. It shares a lot of call's internals. It's important to use -e. You pick a reference path with -p, and it will project every other path onto it and make a vcf. It should support cycles.

On Fri, Sep 13, 2019 at 8:45 AM Erik Garrison notifications@github.com wrote:

In CPANG19, many students wanted to project a complex graph (not even alignments to it, but the paths in it) out to a VCF file.

It seems like vg call could be able to do something like this, should we also mange to handle cyclic graphs. The problem looks to be that the "alignments" made by paths through the graph don't have qualities, so are ignored by call. I'm not completely sure though.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2454?email_source=notifications&email_token=AAG373VFUS57Z6TVLH2UEPTQJODNTA5CNFSM4IWPRP42YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HLHH6VA, or mute the thread https://github.com/notifications/unsubscribe-auth/AAG373XZNRNOGXXBU5VEKZDQJODNTANCNFSM4IWPRP4Q .

glennhickey commented 5 years ago

I actually have a deconstruct blurb and example half written for the variatn calling part of the README, but forgot about it. Will try to get that merged in today. I've used it with success on some pretty funky seqish and cactus graphs.

glennhickey commented 5 years ago

Another thing I need to add to the readme that applies to vg call and vg deconstruct: They both compute snarls on the fly, and this dominates their running times. For larger graphs, I find it best to pre-compute the snarls once with vg snarls -r graph.vg/xg > graph.snarls, then pass that in to call or deconstruct with -r graph.snarls. For human-sized genomes, you probably need to get the snarls for each chromosome in parallel then cat them together for a whole-genome snarl index.

glennhickey commented 5 years ago

The variant calling bit in the readme is updated with some more details, as well as how to use deconstruct make vcf's out of path information.

ekg commented 5 years ago

Great, thanks. We should start testing deconstruct and variant calling on difficult graphs to make sure this is all stable.

On Sat, Sep 14, 2019, 14:04 Glenn Hickey notifications@github.com wrote:

The variant calling bit in the readme is updated with some more details, as well as how to use deconstruct make vcf's out of path information.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2454?email_source=notifications&email_token=AABDQENEFV3RGSBM757SAFLQJTHNHA5CNFSM4IWPRP42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6W2O7Q#issuecomment-531474302, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQEOXRFEHB533LXUDNEDQJTHNHANCNFSM4IWPRP4Q .