pangenome / vcfbub

use variant nesting information to flter overlapping sites from vg deconstruct output
MIT License
23 stars 1 forks source link

InvalidInput, error: "invalid gzip header" #4

Closed LiaOb21 closed 1 year ago

LiaOb21 commented 1 year ago

Dear developers,

I wanted to try vcfbub using a pangenome of A. thaliana built using pggb.

This is the vg deconstruct (vg version v1.40.0 "Suardi") command stored in my .log file:

vg deconstruct -P Arabidopsis_thaliana -H # -e -a -t 16 c0_output/Arabidopsis_input.fasta.gz.a7c61d0.community.0.fa.7659a9d.417fcdf.171f02d.smooth.final.gfa

This is how the .decomposed.vcf looks like:

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=CONFLICT,Number=.,Type=String,Description="Sample names for which there are multiple paths in the graph with conflicting alleles">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Total number of alternate alleles in called genotypes">
##INFO=<ID=AF,Number=A,Type=Float,Description="Estimated allele frequency in the range (0,1]">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of samples with data">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=LV,Number=1,Type=Integer,Description="Level in the snarl tree (0=top level)">##INFO=<ID=PS,Number=1,Type=String,Description="ID of variant corresponding to parent snarl">
##INFO=<ID=AT,Number=R,Type=String,Description="Allele Traversal as path in graph">##contig=<ID=Arabidopsis_thaliana#1#1,length=30427671>
##INFO=<ID=LEN,Number=A,Type=Integer,Description="allele length">##INFO=<ID=ORIGIN,Number=1,Type=String,Description="Decomposed from a complex record using vcflib vcfwave and alignment with WFA2-lib.">
##INFO=<ID=INV,Number=A,Type=String,Description="Count of haplotypes which are aligned in the inverted orientation using vcflib vcfwave.">
##bcftools_annotateVersion=1.16+htslib-1.16
##bcftools_annotateCommand=annotate -x INFO/TYPE c0_output/Arabidopsis_input.fasta.gz.a7c61d0.community.0.fa.7659a9d.417fcdf.171f02d.smooth.final.Arabidopsis_thaliana.decomposed.vcf.tmp.vcf; Date=Tue Dec 13 00:21:36 2022
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  GCA_020911765   GCA_023115395   GCA_900660825   GCA_902460265   GCA_902460275   GCA_902460285   GCA_902460295   GCA_902460305   GCA_902460315   GCA_902825305   GCA_903064285   GCA_903064295   GCA_903064325   GCA_904420315   GCA_933208065   MyArabidopsis
Arabidopsis_thaliana#1#1        171     >123>125        T       TA      60      .       AC=1;AF=0.166667;AN=6;AT=>123>125,>123>124>125;LV=0;NS=6        GT      0       0       .       .       .       .       .       .       0       0       0       .       .       1       .       .
Arabidopsis_thaliana#1#1        177     >125>128        C       T       60      .       AC=1;AF=0.166667;AN=6;AT=>125>127>128,>125>126>128;LV=0;NS=6    GT      0       0       .       .       .       .       .       .       1       0       0       .       .       0       .       .
Arabidopsis_thaliana#1#1        219     >133>136        C       T       60      .       AC=1;AF=0.142857;AN=7;AT=>133>135>136,>133>134>136;LV=0;NS=7    GT      0       0       .       .       .       .       .       .       1       0       0       .       .       0       .       0
Arabidopsis_thaliana#1#1        253     >136>139        T       C       60      .       AC=2;AF=0.285714;AN=7;AT=>136>138>139,>136>137>139;LV=0;NS=7    GT      0       0       .       .       .       .       .       .       1       0       0       .       .       1       .       0
Arabidopsis_thaliana#1#1        270     >139>142        G       A       60      .       AC=1;AF=0.142857;AN=7;AT=>139>141>142,>139>140>142;LV=0;NS=7    GT      0       0       .       .       .       .       .       .       1       0       0       .       .       0       .       0
Arabidopsis_thaliana#1#1        276     >142>145        T       G       60      .       AC=1;AF=0.142857;AN=7;AT=>142>144>145,>142>143>145;LV=0;NS=7    GT      0       0       .       .       .       .       .       .       1       0       0       .       .       0       .       0
Arabidopsis_thaliana#1#1        291     >145>148        T       A       60      .       AC=1;AF=0.142857;AN=7;AT=>145>147>148,>145>146>148;LV=0;NS=7    GT      0       0       .       .       .       .       .       .       0       0       0       .       .       1       .       0

I tried to run vcfbub with the following command:

vcfbub -l 0 -r 10000 -i Arabidopsis_input.fasta.gz.a7c61d0.community.0.fa.7659a9d.417fcdf.171f02d.smooth.final.Arabidopsis_thaliana.decomposed.vcf > Arabidopsis_vcfbub_l0_r10000.vcf

And I am getting the following error:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: IoError(Custom { kind: InvalidInput, error: "invalid gzip header" })', src/main.rs:14:6
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Rust full backtrace:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: IoError(Custom { kind: InvalidInput, error: "invalid gzip header" })', src/main.rs:14:6
stack backtrace:
   0:     0x556630e01100 - std::backtrace_rs::backtrace::libunwind::trace::h32eb3e08e874dd27
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:     0x556630e01100 - std::backtrace_rs::backtrace::trace_unsynchronized::haa3f451d27bc11a5
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x556630e01100 - std::sys_common::backtrace::_print_fmt::h5b94a01bb4289bb5
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/sys_common/backtrace.rs:66:5
   3:     0x556630e01100 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hb070b7fa7e3175df
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/sys_common/backtrace.rs:45:22
   4:     0x556630e20e8e - core::fmt::write::hd5207aebbb9a86e9
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/fmt/mod.rs:1202:17
   5:     0x556630dfe935 - std::io::Write::write_fmt::h3bd699bbd129ab8a
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/io/mod.rs:1679:15
   6:     0x556630e02783 - std::sys_common::backtrace::_print::h7a21be552fdf58da
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/sys_common/backtrace.rs:48:5
   7:     0x556630e02783 - std::sys_common::backtrace::print::ha85c41fe4dd80b13
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/sys_common/backtrace.rs:35:9
   8:     0x556630e02783 - std::panicking::default_hook::{{closure}}::h04cca40023d0eeca
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:295:22
   9:     0x556630e0246f - std::panicking::default_hook::haa3ca8c310ed5402
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:314:9
  10:     0x556630e02e2a - std::panicking::rust_panic_with_hook::h7b190ce1a948faac
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:698:17
  11:     0x556630e02d27 - std::panicking::begin_panic_handler::{{closure}}::hbafbfdc3e1b97f68
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:588:13
  12:     0x556630e015ac - std::sys_common::backtrace::__rust_end_short_backtrace::hda93e5fef243b4c0
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/sys_common/backtrace.rs:138:18
  13:     0x556630e02a42 - rust_begin_unwind
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:584:5
  14:     0x556630d7ed33 - core::panicking::panic_fmt::h8d17ca1073d9a733
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/panicking.rs:142:14
  15:     0x556630d7ee23 - core::result::unwrap_failed::hfaddf24b248137d3
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/result.rs:1785:5
  16:     0x556630d8728c - vcfbub::main::h5c2271e241dd4124
  17:     0x556630d87b23 - std::sys_common::backtrace::__rust_begin_short_backtrace::h73a2075f4446c9c0
  18:     0x556630d8f419 - std::rt::lang_start::{{closure}}::h5399f682a4a2dd89
  19:     0x556630dfb44f - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::hb69be6e0857c6cfb
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/core/src/ops/function.rs:283:13
  20:     0x556630dfb44f - std::panicking::try::do_call::h396dfc441ee9c786
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492:40
  21:     0x556630dfb44f - std::panicking::try::h6cdda972d28b3a4f
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:456:19
  22:     0x556630dfb44f - std::panic::catch_unwind::h376039ec264e8ef9
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panic.rs:137:14
  23:     0x556630dfb44f - std::rt::lang_start_internal::{{closure}}::hc94720ca3d4cb727
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/rt.rs:148:48
  24:     0x556630dfb44f - std::panicking::try::do_call::h2422fb95933fa2d5
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:492:40
  25:     0x556630dfb44f - std::panicking::try::h488286b5ec8333ff
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panicking.rs:456:19
  26:     0x556630dfb44f - std::panic::catch_unwind::h81636549836d2a25
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/panic.rs:137:14
  27:     0x556630dfb44f - std::rt::lang_start_internal::h6ba1bb743c1e9df9
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library/std/src/rt.rs:148:20
  28:     0x556630d87a28 - main
  29:     0x7fa6c21c7083 - __libc_start_main
                               at /build/glibc-SzIz7B/glibc-2.31/csu/../csu/libc-start.c:308:16
  30:     0x556630d7efb5 - <unknown>

Is there anything wrong with my decomposed.vcf file or is there any conflict between libraries/packages? I'm not familiar with Rust so I apologise if it's a basic question.

Thank you in advance.

Best regards,

Lia

AndreaGuarracino commented 1 year ago

vcfbub requires a gzipped VCF file as input. Run:

gzip Arabidopsis_input.fasta.gz.a7c61d0.community.0.fa.7659a9d.417fcdf.171f02d.smooth.final.Arabidopsis_thaliana.decomposed.vcf
vcfbub -l 0 -r 10000 -i Arabidopsis_input.fasta.gz.a7c61d0.community.0.fa.7659a9d.417fcdf.171f02d.smooth.final.Arabidopsis_thaliana.decomposed.vcf.gz > Arabidopsis_vcfbub_l0_r10000.vcf