vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

vg genotype Segmentation fault #1138

Open ChriKub opened 6 years ago

ChriKub commented 6 years ago

Hi Guys, I use a valid read mapping on a valid vg graph for genotyping. g genotype -v -r path1 -q -S -p -t 8 graph.vg mapping.gam.index/ > calls.vcf The genotyping fails after augmenting the graph

Reading input graph... Loading reads... Loaded 3 alignments Calling against path path1 Augmented graph; got 1901750 translations Converted 3 alignments to embedded paths Looking at graph of 950875 nodes Segmentation fault

Alignment:

Total alignments: 3 Total primary: 1 Total secondary: 2 Total aligned: 1 Insertions: 0 bp in 0 read events Deletions: 1 bp in 1 read events Substitutions: 1 bp in 1 read events Softclips: 58 bp in 1 read events Unvisited nodes: 10683873/10683927 (265375945 bp) Single-visited nodes: 54/10683927 (612 bp) Significantly biased heterozygous sites: 0/0

What is going wrong? I tried several different configurations of commands but the SegFault always occurs. Thanks

ChriKub commented 6 years ago

I uploaded the graph and mapping here.

edawson commented 6 years ago

I should have caught this right off the bat but you'll need to add the -C option to your command to use Cactus for bubble finding, as we've silently deprecated the default supbub bubble finding algorithm. The other issue I hit was that your path has a different name than your example.

My working command line was:

vg ./bin/vg genotype -p -C -t 4 -r TAIR_Chr1 -q -v TAIR_chr1_flat.vg TAIR_chr1_flat.gam.index/ > calls.vcf

Have you tried vg call? Genotype has languished over the past year or so as most of the vgteam has worked on other things. The documentation is clearly out of date.

Some notes: I downloaded your graph / gam and ran your CL through gdb, and indeed hit your segfault. It's in a step that finds strongly connected components in the supbub (findScc) before unrolling. I added the -C flag and got an error indicating the path wasn't found in the graph. I did a vg paths -L TAIR_chr1_flat.vg, found the only path name, and used that instead of path1. This produced an empty header in calls.vcf -Cactus should be made the default and only bubble finding algo and I should update the docs to reflect these changes. -The output VCF with this tiny gam should be just a header, as it'll fail a minimal support requirement. Hopefully a gam with sufficient coverage will yield vcf calls, but I honestly can't guarantee it will.