vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.1k stars 194 forks source link

Problem constructing graph from vcf file: Unsupported variant allelle "<DEL>" #2857

Open egoltsman opened 4 years ago

egoltsman commented 4 years ago

Hi, I'm trying to induce a variation graph with 'vg construct -v' using a vcf file (v4.2) and the corresponding reference fasta file. I don't see anything obviously wrong with the vcf file, although it has undergone some filtering from its original form. The data was narrowed down to a single reference chromosome (~72mb) in both the fasta file and the vcf. Vg spits out there warnings but continues on to build a graph. When I inspect the graph, however, every segment is exactly 32bp long, so it looks like the variant info got ignored, and everything got chopped to some default block size.

warning:[vg::Constructor] Unsupported variant allele "<DEL>"; Skipping variant(s) Bd1   61293   8203    N   <DEL>   6975.37 .   AC=59;ALG=PROD;AN=132;CIEND=-1,2;CIEND95=0,0;CIPOS=-2,2;CIPOS95=0,0;END=63258;GCF=0.494405;PE=514;SNAME=Nvk1:8,AL2F:5,INZS:5,G33i6:6,BNT8:6,IUPW:5,IOHW:4,INZT:7,SLZ2:6,Sig2:6,HZSW:2,HYYO:3,IHOX:4,HZTA:2,IHOW:6,Foz1:6,GPIT:3,S6D5:3,AL2D:6,ICHH:5,Bd29-1:8,G33i4:5,IOGT:6,GWWC:3,INZY:5,IFWU:2,IAZO:5,HZHA:3,ISSN:4,BNT4:7,INZX:5,GPIU:2,G30i2:5,GR64:4,HZHF:7,HZZN:5,HZSY:4,ICUU:4,ABR9:6,IOAB:6,GPZC:4,IGFS:1,GWWB:3,IBBB:2,ICWB:4;SR=194;STRANDS=+-:708;SU=708;SVLEN=-1965;SVTYPE=DEL !
warning:[vg::Constructor] Unsupported variant allele "<DUP>"; Skipping variant(s) Bd1   500421  39632   N   <DUP>   11924   .   AC=54;ALG=PROD;AN=136;CIEND=0,0;CIEND95=0,0;CIPOS=-1,0;CIPOS95=0,0;END=504654;GCF=0.532829;PE=4444;SNAME=G33i4:25,BNT4:40,AL2E:32,BNT8:37,Nvk1:39,IHOX:20,CSR6:34,SLZ2:36,Tek9:30,G33i6:28,S6D5:15,ABR6:28,IAZO:20,G30i2:27,GR64:24,BNT3:28,LPA32:30,IHOX:11099,IAZI:24,IGFP:26,INZY:23,Bd29-1:27952,AL2D:36,INZY:12563,IAZT:12,IGFP:6042,Arn1a:20,Bd29-1:65,277_1000561:13982,IAZO:6534,AL2F:34,Arn1a:10538,INZP:28,GPFZ:13;SR=412;STRANDS=-+:4856;SU=4856;SVLEN=4233;SVTYPE=DUP !
warning:[vg::Constructor] Unsupported variant allele "<INV>"; Skipping variant(s) Bd1   796111  45820   N   <INV>   1273.12 .   AC=22;ALG=PROD;AN=138;CIEND=-71,24;CIEND95=-4,4;CIPOS=-141,29;CIPOS95=-8,8;END=827676;GCF=0.481467;PE=205;SNAME=IUPW:66_1,GPZC:23_1,IAZI:31_1,IOGT:19_1,ABR9:19_1,IOAB:19_1,HZSU:12,Nvk1:49,IOGS:22_1,AL2D:49_1,1293_1021402:12_1,Tek9:41;SR=0;STRANDS=++:9,--:196;SU=205;SVLEN=31565;SVTYPE=INV;IMPRECISE !

The warnings refer to the first instances of each of the three variant types that it's complaining about, and you can pretty much see the entire vcf line quoted. I could provide the full file, of course.

vg version v1.23.0 "Lavello" Compiled with g++ (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 on Linux Linked against libstd++ 20191114 Built by anovak@octagon

Thank in advance for your help!

egoltsman commented 4 years ago

p.s. The vcf file contains only SVs and no SNPs.

ekg commented 4 years ago

What command did you use with vg construct?

On Fri, Jun 19, 2020, 06:32 Eugene Goltsman notifications@github.com wrote:

p.s. The vcf file contains only SVs and no SNPs.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2857#issuecomment-646429048, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQEL6RCD2JG7IXZWE45LRXLS7BANCNFSM4OCL6WYA .

egoltsman commented 4 years ago

Here's the command: vg construct -r ref.fa -v vars.vcf.gz

Prior to that I used bgzip and tabix to compress and index the vcf.

ekg commented 4 years ago

I think you may need to use another flag to enable these symbolic alleles to be used. Glenn and Jean have been working with SV graphs and it'd be good to confirm how they build them.

On Fri, Jun 19, 2020 at 9:35 AM Eugene Goltsman notifications@github.com wrote:

Here's the command: vg construct -r ref.fa -v vars.vcf.gz

Prior to that I used bgzip and tabix to compress and index the vcf.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2857#issuecomment-646487608, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABDQENVOKDN2RVZYA73K43RXMILNANCNFSM4OCL6WYA .

glennhickey commented 4 years ago

construct -S will enable symbolic allele support. The VCF format is fairly open-ended for this, so it doesn't work on every file, but I think your "END" tags should be enough.

It should support (with accompanying fasta for sequence), and . I don't think is supported yet, though.

On Fri, Jun 19, 2020 at 5:14 AM Erik Garrison notifications@github.com wrote:

I think you may need to use another flag to enable these symbolic alleles to be used. Glenn and Jean have been working with SV graphs and it'd be good to confirm how they build them.

On Fri, Jun 19, 2020 at 9:35 AM Eugene Goltsman notifications@github.com wrote:

Here's the command: vg construct -r ref.fa -v vars.vcf.gz

Prior to that I used bgzip and tabix to compress and index the vcf.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2857#issuecomment-646487608, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AABDQENVOKDN2RVZYA73K43RXMILNANCNFSM4OCL6WYA

.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/vgteam/vg/issues/2857#issuecomment-646528381, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG373T6N5IP4GADZPGL4ZDRXMT5TANCNFSM4OCL6WYA .

egoltsman commented 4 years ago

With the -S flag it crashes with

vg: src/Variant.cpp:349: bool vcflib::Variant::canonicalize(FastaReference&, std::vector<FastaReference*>, bool, int): Assertion `canonicalizable()' failed.

Here is the stacktrace. Note that I'm running the binary that I downloaded as I couldn't get the package to build from source.

Crash report for vg v1.23.0 "Lavello"
Stack trace (most recent call last):
#12   Object "/global/u1/e/eugeneg/utils/vgteam/vgtools/vg", at 0x4dd279, in _start
#11   Object "/global/u1/e/eugeneg/utils/vgteam/vgtools/vg", at 0x1bb8ec8, in __libc_start_main
#10   Object "/global/u1/e/eugeneg/utils/vgteam/vgtools/vg", at 0x40aed7, in main
#9    Object "/global/u1/e/eugeneg/utils/vgteam/vgtools/vg", at 0x9e77c7, in vg::subcommand::Subcommand::operator()(int, char**) const
#8    Object "/global/u1/e/eugeneg/utils/vgteam/vgtools/vg", at 0xa2a0e4, in main_construct(int, char**)
#7    Object "/global/u1/e/eugeneg/utils/vgteam/vgtools/vg", at 0xd10719, in vg::Constructor::construct_graph(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::all
ocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std:
:__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::function<void (vg::Graph&)> const&)
#6    Object "/global/u1/e/eugeneg/utils/vgteam/vgtools/vg", at 0xd0f88e, in vg::Constructor::construct_graph(std::vector<FastaReference*, std::allocator<FastaReference*> > const&, std::vector<vcflib::VariantCal
lFile*, std::allocator<vcflib::VariantCallFile*> > const&, std::vector<FastaReference*, std::allocator<FastaReference*> > const&, std::function<void (vg::Graph&)> const&)
#5    Object "/global/u1/e/eugeneg/utils/vgteam/vgtools/vg", at 0xd0dcf2, in vg::Constructor::construct_graph(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, FastaReference&, vg:
:VcfBuffer&, std::vector<FastaReference*, std::allocator<FastaReference*> > const&, std::function<void (vg::Graph&)> const&)
#4    Object "/global/u1/e/eugeneg/utils/vgteam/vgtools/vg", at 0xf5f8fc, in vcflib::Variant::canonicalize(FastaReference&, std::vector<FastaReference*, std::allocator<FastaReference*> >, bool, int)
#3    Object "/global/u1/e/eugeneg/utils/vgteam/vgtools/vg", at 0x1bbcde1, in __assert_fail
#2    Object "/global/u1/e/eugeneg/utils/vgteam/vgtools/vg", at 0x1bbcd6b, in __assert_fail_base
#1    Object "/global/u1/e/eugeneg/utils/vgteam/vgtools/vg", at 0x1bc9790, in abort
#0    Object "/global/u1/e/eugeneg/utils/vgteam/vgtools/vg", at 0x117ba67, in raise
egoltsman commented 4 years ago

Including the data to reproduce this: vg_construct_crash.data_to_reproduce.zip