vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.12k stars 194 forks source link

vg deconstruct #800

Open mahulchak opened 7 years ago

mahulchak commented 7 years ago

Hello,

I am trying to run vg deconstruct on a vg file that I created from whole genome alignments. However, the command did not generate any output. None of the subcommands worked for deconstruct (a few other vg commands I tried are working). Any idea why it might not be working?

Thanks. Mahul

edawson commented 7 years ago

Hey Mahul,

I broke vg deconstruct a few months back when we moved from one bubble detection algorithm to another. I haven't taken the time to fix it, but it has been on my todo list.

I will try to get to it this week, as you are not the first to attempt using it. In the meantime, call is a viable variant caller, and snarls can produce some useful output for looking at bubbles in the graph.

Best, Eric

mahulchak commented 7 years ago

Hi Eric, Thanks for replying quickly. I don't have a read alignment or .gam file so I was not sure if I could use the call subcommand. Deconstruct might be the most appropriate subcommand for me but I could be wrong. I'll play with call and snarls and see what I get. Hopefully you'll have time to reactivate deconstruct ☺ Thanks, Mahul

On Fri, May 12, 2017, 18:28 Eric T. Dawson notifications@github.com wrote:

Hey Mahul,

I broke vg deconstruct a few months back when we moved from one bubble detection algorithm to another. I haven't taken the time to fix it, but it has been on my todo list.

I will try to get to it this week, as you are not the first to attempt using it. In the meantime, call is a viable variant caller, and snarls can produce some useful output for looking at bubbles in the graph.

Best, Eric

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/800#issuecomment-301216982, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMD6IIlZ9LSjjYNexZJ5MCPdSbDIo-Iks5r5QbpgaJpZM4NZ6Zx .

edawson commented 7 years ago

Github automatically closed this after my PR merged. I've reopened it till we get some verification it at least partially works.

@mahulchak , if your vg file you've created (I assume with MSGA) has paths, then deconstruct will be able to pull out one of these paths at a time (specified on the command line) and give you a VCF containing variation relative to that path. Is that what you'd like?

I have merged in a new version of deconstruct that should work for SNPs and smallish events. Complex structural variation will quite possibly break it.

Mind giving it a try?

P.S.: If you don't have any paths things get a bit more complicated - orienting on a graph without paths is weird. We might still be able to do it but it won't make a proper VCF.

mahulchak commented 7 years ago

Hi Eric, Thanks again. I'm working on a test dataset of 1Mb chunk from 15 genomes. I'll test on it later today or early tomorrow and let you know. Mahul

On Sat, May 13, 2017, 15:14 Eric T. Dawson notifications@github.com wrote:

Github automatically closed this after my PR merged. I've reopened it till we get some verification it at least partially works.

@mahulchak https://github.com/mahulchak , if your vg file you've created (I assume with MSGA) has paths, then deconstruct will be able to pull out one of these paths at a time (specified on the command line) and give you a VCF containing variation relative to that path. Is that what you'd like?

I have merged in a new version of deconstruct that should work for SNPs and smallish events. Complex structural variation will quite possibly break it.

Mind giving it a try?

P.S.: If you don't have any paths things get a bit more complicated - orienting on a graph without paths is weird. We might still be able to do it but it won't make a proper VCF.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/800#issuecomment-301278093, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMD6HdnjMK8BlLX0xcJRO8G-NBqIOfbks5r5isrgaJpZM4NZ6Zx .

mahulchak commented 7 years ago

Naive question: Does the --path option in deconstruct refer to genome/sequence names? I tried using names of one of the genomes as well as one of the ancestral genome (Anc0) but I am getting -

vg: src/path_index.cpp:254: vg::PathIndex::PathIndex(vg::VG&, const string&, bool): Assertion `vg.paths.has_path(path_name)' failed.

Am I missing an argument?

On Sat, May 13, 2017 at 3:14 PM Eric T. Dawson notifications@github.com wrote:

Github automatically closed this after my PR merged. I've reopened it till we get some verification it at least partially works.

@mahulchak https://github.com/mahulchak , if your vg file you've created (I assume with MSGA) has paths, then deconstruct will be able to pull out one of these paths at a time (specified on the command line) and give you a VCF containing variation relative to that path. Is that what you'd like?

I have merged in a new version of deconstruct that should work for SNPs and smallish events. Complex structural variation will quite possibly break it.

Mind giving it a try?

P.S.: If you don't have any paths things get a bit more complicated - orienting on a graph without paths is weird. We might still be able to do it but it won't make a proper VCF.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/800#issuecomment-301278093, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMD6HdnjMK8BlLX0xcJRO8G-NBqIOfbks5r5isrgaJpZM4NZ6Zx .

edawson commented 7 years ago

The -p / --path option should refer to a path in the graph, yes. You can get a list of all the path by doing vg paths -L <my_graph>.vg, which should print a list of all the paths labeled in the graph. If your path isn't listed then it isn't labeled in the graph for some reason.

edawson commented 7 years ago

@mahulchak I've confirmed with the team that vg msga should incorporate and label your paths; they should show up in vg paths -L and should work for deconstruct.

mahulchak commented 7 years ago

Yes the paths are named after the chromosomes which were changed during hal to vg conversion. I now have the names of the paths and testing vg deconstruct for one of the paths.

On Tue, May 16, 2017, 09:43 Eric T. Dawson notifications@github.com wrote:

@mahulchak https://github.com/mahulchak I've confirmed with the team that vg msga should incorporate and label your paths; they should show up in vg paths -L and should work for deconstruct.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/800#issuecomment-301841310, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMD6MASBs6bO4TLuLFARKbaM_Hc059Wks5r6dGXgaJpZM4NZ6Zx .

mahulchak commented 7 years ago

Hi Eric, I got a

terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc

error. The vcf file only has headers so nothing was written to it.

On Tue, May 16, 2017 at 10:49 AM mahul chakraborty mahulchak@gmail.com wrote:

Yes the paths are named after the chromosomes which were changed during hal to vg conversion. I now have the names of the paths and testing vg deconstruct for one of the paths.

On Tue, May 16, 2017, 09:43 Eric T. Dawson notifications@github.com wrote:

@mahulchak https://github.com/mahulchak I've confirmed with the team that vg msga should incorporate and label your paths; they should show up in vg paths -L and should work for deconstruct.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/800#issuecomment-301841310, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMD6MASBs6bO4TLuLFARKbaM_Hc059Wks5r6dGXgaJpZM4NZ6Zx .

edawson commented 7 years ago

I'm thinking that's a bug on my end, but it's hard to debug without a test case. If you want to share data via email (mine's on my Github profile) that would help; otherwise I can try to generate one.

mahulchak commented 7 years ago

I will share the vg file with you shortly.

On Tue, May 16, 2017 at 11:46 AM Eric T. Dawson notifications@github.com wrote:

I'm thinking that's a bug on my end, but it's hard to debug without a test case. If you want to share data via email (mine's on my Github profile) that would help; otherwise I can try to generate one.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/800#issuecomment-301876490, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMD6Kf6QuoaEbvGNeRzYRmiPRgQacLNks5r6e3NgaJpZM4NZ6Zx .

mahulchak commented 7 years ago

Did you get the vg file?

On Tue, May 16, 2017 at 11:50 AM mahul chakraborty mahulchak@gmail.com wrote:

I will share the vg file with you shortly.

On Tue, May 16, 2017 at 11:46 AM Eric T. Dawson notifications@github.com wrote:

I'm thinking that's a bug on my end, but it's hard to debug without a test case. If you want to share data via email (mine's on my Github profile) that would help; otherwise I can try to generate one.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/800#issuecomment-301876490, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMD6Kf6QuoaEbvGNeRzYRmiPRgQacLNks5r6e3NgaJpZM4NZ6Zx .

mahulchak commented 7 years ago

It is crashing due to excessive memory use. E.g. My vg file has alignment of 15 1Mb chunks and vg deconstruct is taking more than 300GB memory.

mahulchak commented 7 years ago

I tried aligning the same 15 1Mb chunks of 15 genomes using vg msga and then tried vg deconstruct on the vg file. After getting 15 lines written into the vcf file, I get - terminate called after throwing an instance of 'std::runtime_error' what(): No node 0 in graph

ekg commented 7 years ago

Have we moved over to snarls for deconstruct yet?

On Thu, Jun 1, 2017, 8:26 PM Mahul Chakraborty notifications@github.com wrote:

I tried aligning the same 15 1Mb chunks of 15 genomes using vg msga and then tried vg deconstruct on the vg file. After getting 15 lines written into the vcf file, I get - terminate called after throwing an instance of 'std::runtime_error' what(): No node 0 in graph

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/800#issuecomment-305578943, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI4EX4y0xq0J6OxbipByyXYNvV2hQ5sks5r_wJigaJpZM4NZ6Zx .

edawson commented 7 years ago

Yes, but it's having memory usage issues apparently. It works on the test cases in vg.

mahulchak commented 7 years ago

If you want, I have another vg file based an even smaller (100Kb) and simpler sequences. And I have a vg made with vg msga and another created from a hal file. vg deconstruct crashes due to excessive memory usage for both vg files.

On Tue, Jun 20, 2017 at 4:34 PM Eric T. Dawson notifications@github.com wrote:

Yes, but it's having memory usage issues apparently. It works on the test cases in vg.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/800#issuecomment-309920188, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMD6BPu53Ms0dJUR80ifhE-6Mn1JDwhks5sGFXZgaJpZM4NZ6Zx .

edawson commented 7 years ago

@mahulchak could you send your smaller test case? My guess is that I'm caching something that I've underestimated the size of.

ChriKub commented 7 years ago

@edawson I get the std:bad_alloc as well. Is it a new bug or does the old one still persist? I could provide you with a small test case if it helps. Thanks, Chris