vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

convert gfa file into vg #3980

Open GeorgeBGM opened 1 year ago

GeorgeBGM commented 1 year ago

1. What were you trying to do?

 I would like to convert gfa file into vg format.

2. What did you want to happen?

 killed or Terminated.

3. What actually happened?

It is easy to accomplish.

4. If you got a line like Stack trace path: /somewhere/on/your/computer/stacktrace.txt, please copy-paste the contents of that file here:

Place stacktrace here.

5. What data and command can the vg dev team use to make the problem happen?

 _vg view -F hprc-v1.0-mc-grch38.gfa --threads 3 --vg > hprc-v1.0-mc-grch38.gfa.vg_

 node 1:
 ![image](https://github.com/vgteam/vg/assets/26595839/adc2e4d5-c022-4220-bb59-363000b727a3)
 node 2:
 ![image](https://github.com/vgteam/vg/assets/26595839/cc1c6f03-3c47-498d-93cd-5f6e73420692)

The _vg convert -t 10 -g  CPC.HPRC.Phase1.CHM13v2.gfa -v > _CPC.HPRC.Phase1.CHM13v2.vg__ command report similar error message.

6. What does running vg version say?

vg: variation graph tool, version v1.48.0 "Gallipoli"
jeizenga commented 1 year ago

It looks to me like those are warnings, not errors, so they shouldn't have caused VG to crash. However, the fact that your GFA has duplicate paths is a cause for concern. You might want to look into the pipeline that's creating the GFA to figure out why it's adding multiple paths with the same name.

glennhickey commented 1 year ago

You're running out of memory. You're best to use the xg and gbz released alongside that gfa file in order to use vg.

If you really want to convert that graph into a .vg file (for editing?) then you can try

vg convert -f hprc-v1.0-mc-grch38.gfa >  hprc-v1.0-mc-grch38.vg

Or if that takes too much memory (but this will drop haplotypes)

vg convert -f hprc-v1.0-mc-grch38.gfa -H >  hprc-v1.0-mc-grch38.vg

vg view --vg converts the graph to vg Protobuf, which is deprecated, and should be avoided. We should deprecate this option.

GeorgeBGM commented 1 year ago

Thanks for your reply.

Yes, I need to use vg tools to further complete the downstream analysis (vg mpmap;vg augment;vg deconstruct) and the gfa file cannot be imported directly. I am curious how much memory will be consumed to complete the process and if 1T of memory is enough to complete the gfa into a .vg file in the HPRC project.

At 2023-06-08 04:04:17, "Glenn Hickey" @.***> wrote:

You're running out of memory. You're best to use the xg and gbz released alongside that gfa file in order to use vg.

If you really want to convert that graph into a .vg file (for editing?) then you can try

vg convert -f hprc-v1.0-mc-grch38.gfa > hprc-v1.0-mc-grch38.vg

Or if that takes too much memory (but this will drop haplotypes)

vg convert -f hprc-v1.0-mc-grch38.gfa -H > hprc-v1.0-mc-grch38.vg

vg view --vg converts the graph to vg Protobuf, which is deprecated, and should be avoided. We should deprecate this option.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

GeorgeBGM commented 1 year ago

By the way, is it possible to use multiple CPUs in collaboration or through GPUs to complete the process.

At 2023-06-08 10:22:06, "杜多" @.***> wrote:

Thanks for your reply.

Yes, I need to use vg tools to further complete the downstream analysis (vg mpmap;vg augment;vg deconstruct) and the gfa file cannot be imported directly. I am curious how much memory will be consumed to complete the process and if 1T of memory is enough to complete the gfa into a .vg file in the HPRC project.

At 2023-06-08 04:04:17, "Glenn Hickey" @.***> wrote:

You're running out of memory. You're best to use the xg and gbz released alongside that gfa file in order to use vg.

If you really want to convert that graph into a .vg file (for editing?) then you can try

vg convert -f hprc-v1.0-mc-grch38.gfa > hprc-v1.0-mc-grch38.vg

Or if that takes too much memory (but this will drop haplotypes)

vg convert -f hprc-v1.0-mc-grch38.gfa -H > hprc-v1.0-mc-grch38.vg

vg view --vg converts the graph to vg Protobuf, which is deprecated, and should be avoided. We should deprecate this option.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

GeorgeBGM commented 1 year ago

Hi, I'm using version v1.48.0 of vg and I noticed that the -f parameter refers to output as GFA format, so I'm a bit confused about that command (vg convert -f hprc-v1.0-mc-grch38.gfa > hprc-v1.0-mc-grch38.vg). I used 1T of CPU memory to convert the GFA file into a vg format that VG can recognize, but it still got a similar error. I'm curious if I can split the GFA based on chromosomes first, then finish the GFA file and VG file conversion by chromosomes, and then eventually merge them together.So how to split the GFA file by chromosomes?Are other suggestions on this issue?Thanks!

jeizenga commented 1 year ago

I think it should be -g instead of -f