vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

haplotype extraction in VG format no longer works #2428

Open ekg opened 5 years ago

ekg commented 5 years ago

Today I was trying to extract threads from a GBWT as paths. I had to pull them out as GAM and augment the graph with them, using many gigabytes of RAM for a single gene (p53) and 5000 haplotypes.

In this case, it would have been much simpler to use the .vg format extraction of the paths, but this is no longer usable because we cannot concatenate .vg files with cat due to the VGPKG wrapper format.

This experience suggests to me that it would be simpler to simply use GFA or other text formats for interchange. There are many problems with the .vg file format, and I would be happy to stop using it. We should consider adapting the W walk line format suggested by @lh3 to meet the requirements of subpath description. This is really the only thing that the .vg format has that we can't get in GFA.

cademirch commented 5 years ago

Hi Erik, does this mean that the instructions on simulating from a particular haplotype will not work as expected?

Thanks.

ekg commented 5 years ago

I think you'd have to change this particular step to use GAM format and vg augment.

On Thu, Sep 5, 2019, 02:24 cademirch notifications@github.com wrote:

Hi Erik, does this mean that the instructions on simulating from a particular haplotype https://github.com/vgteam/vg/wiki/Simulating-reads-with-vg-sim will not work as expected?

Thanks.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2428?email_source=notifications&email_token=AABDQEPGH5NH5S54NKCMLG3QH7VMRA5CNFSM4ITO33AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD54KSOY#issuecomment-528001339, or mute the thread https://github.com/notifications/unsubscribe-auth/AABDQEKVWWACWMMYDAUEYYDQH7VMRANCNFSM4ITO33AA .