vgteam / vg

tools for working with genome variation graphs
https://biostars.org/tag/vg/
Other
1.09k stars 193 forks source link

Inversions #2349

Open fergsc opened 5 years ago

fergsc commented 5 years ago

Hi, I am trying to incorporate SV called with sniffles (https://github.com/fritzsedlazeck/Sniffles) into a genome and am having some issues.

I have broken my SV down into three separate vcf for insertions, deletions and inversions and constructed a variant graph for each. vg construct -f -S -a -r genome.fasta -v SVTYPE.vcf > SVTYPE.vg where SVTYPE = the insertion, deletion or inversion.

Construct works fine with insertions and deletions, but with inversions gives this error, multiple times, but not for all variants.

Warning: inversion SVLEN specifies nonzero length change (complex inversions not canonicalizeable) [canonicalize] Superscaffold_5   2087628 166 N   <INV>   0   PASS    CHR2=Superscaffold_5;END=2089826;Kurtosis_quant_start=7.790077;Kurtosis_quant_stop=10.164848;RE=28;SPAN=2198;STD_quant_start=0.000000;STD_quant_stop=0.534522;STRANDS=--;SUPTYPE=SR;SVLEN=2198;SVMETHOD=Snifflesv1.0.10;SVTYPE=INV;PRECISE
SVLEN: 2198
Warning: VCF writer incorrecty produced END = POS + SVLEN for an inversion. Fixing SVLEN to 0.

I have attached my inversion.vcf and hoping someone can help.

thanks

inversion.zip

glennhickey commented 5 years ago

That warning seems fairly harmless. I tried the first inversion in your graph

Superscaffold_5 2087628 166 N   <INV>   .   PASS    PRECISE;SVMETHOD=Snifflesv1.0.10;CHR2=Superscaffold_5;END=2089826;STD_quant_start=0;STD_quant_stop=0.534522;Kurtosis_quant_start=7.79008;Kurtosis_quant_stop=10.1648;SVTYPE=INV;SUPTYPE=SR;SVLEN=2198;STRANDS=--;RE=28  GT:DR:DV    ./.:.:28

(using a vcf file I made of that one variant subbing in my own fasta)

vg construct -v inversion_case.vcf.gz -r Superscaffold_5.fa -R Superscaffold_5:2087500-2090000 -m 100000 -S | vg view - | grep -v S

gives

P   Superscaffold_5 1+,2+,3+    129M,2198M,174M 
L   1   +   2   +   0M
L   1   +   2   -   0M
L   2   +   3   +   0M
L   2   -   3   +   0M

Which looks correct, as it contains the reference path plus the two inversion edges.

(note I'm using -m 100000 in the construct command above to simplify the output to print here, and don't recommend it when building a graph you want to index)

edawson commented 5 years ago

Warning: VCF writer incorrecty produced END = POS + SVLEN for an inversion. Fixing SVLEN to 0. We warn on this because of an inconsistency in the VCF spec, where the example and the spec disagree on what the END of an inversion is. It does get triggered inconsistently; I'm hoping to figure out why as I'm going back through the construction logic, but it should be safe to ignore, as Glenn says. Your inversions might be off-by-one depending on how Sniffles outputs END/SVLEN but it shouldn't cause too much trouble. I also think the spec needs updating given that the majority of callers do this.

vg construct -f -S -a -r genome.fasta -v SVTYPE.vcf > SVTYPE.vg
P   Superscaffold_5 1+,2+,3+    129M,2198M,174M 
L   1   +   2   +   0M
L   1   +   2   -   0M
L   2   +   3   +   0M
L   2   -   3   +   0M

This is definitely the right call and output structure. Your VCF looks perfect with the <INV> tag in the alt field; make sure not to include sequences in the ref/alt fields if you're going to need proper inversion handling, as it overrides these nice graphical SVs and produces flat ones instead.

I'm not sure if the alt_paths get handled correctly for inversions but I'm really interested to find out. For insertions and deletions it should be fine.

fergsc commented 5 years ago

Thanks, I shall just ignore those warning messages.

Next problem: When I take my inversion.vg and try to create a gfa from it using view I get the following error.

vg view inversion.vg > inversion.gfa
terminate called after throwing an instance of 'std::runtime_error'
  what():  [vg::io::MessageIterator] obsolete, invalid, or corrupt input at message 2857027429357 group 2857027429353
ERROR: Signal 6 occurred. VG has crashed. Run 'vg bugs --new' to report a bug.
Stack trace path: /tmp/vg_crash_FWaFrl/stacktrace.txt

stacktrace.txt

Crash report for vg v1.17.0 "Candida"
Stack trace (most recent call last):
#17   Object "/usr/bin/vg", at 0x4d6269, in _start
#16   Object "/usr/bin/vg", at 0x1a3f1f8, in __libc_start_main
#15   Object "/usr/bin/vg", at 0x409832, in main
#14   Object "/usr/bin/vg", at 0x97f927, in vg::subcommand::Subcommand::operator()(int, char**) const
#13   Object "/usr/bin/vg", at 0x98635a, in main_view(int, char**)
#12   Object "/usr/bin/vg", at 0xa7deec, in vg::get_input_file(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void (std::istream&)>)
#11   Object "/usr/bin/vg", at 0x982660, in std::_Function_handler<void (std::istream&), main_view(int, char**)::{lambda(std::istream&)#6}>::_M_invoke(std::_Any_data const&, std::istream&)
#10   Object "/usr/bin/vg", at 0xb04be4, in vg::VG::VG(std::istream&, bool, bool)
#9    Object "/usr/bin/vg", at 0xb0462d, in vg::VG::from_istream(std::istream&, bool, bool)
#8    Object "/usr/bin/vg", at 0x6bf74c, in void vg::io::for_each<vg::Graph>(std::istream&, std::function<void (long, vg::Graph&)> const&)
#7    Object "/usr/bin/vg", at 0xe8180a, in vg::io::MessageIterator::operator++()
#6    Object "/usr/bin/vg", at 0xe810bc, in vg::io::MessageIterator::handle(bool, long, long)
#5    Object "/usr/bin/vg", at 0x197f273, in __cxa_throw
#4    Object "/usr/bin/vg", at 0x197df50, in std::terminate()
#3    Object "/usr/bin/vg", at 0x197df05, in __cxxabiv1::__terminate(void (*)())
#2    Object "/usr/bin/vg", at 0x1a18794, in __gnu_cxx::__verbose_terminate_handler()
#1    Object "/usr/bin/vg", at 0x1a4fac0, in abort
#0    Object "/usr/bin/vg", at 0x10b47a7, in raise