lh3 / bgt

Flexible genotype query among 30,000+ samples whole-genome
MIT License
96 stars 10 forks source link

freebayes vcfs (1.0-r282 and 1.0-r265) #6

Closed ml4wc closed 8 years ago

ml4wc commented 8 years ago

I cannot get freebayes vcf to import on 1.0-r282 at all

I get bgt: atomic.c:111: bcf_atomize: Assertion `i < b->n_info' failed when using bcfs that generated by freebayes --report-monomorphic and a seg fault on default vcfs of reporting only polymorphic (non-reference) sites.

In the release version of 1.0-r265, it works but not for monomorphic

The default freebayes of only polymorphic sites works in this version, but once again I get bgt: atomic.c:111: bcf_atomize: Assertion `i < b->n_info' using --report-monomorphic.

Let me know if you want backtraces, variable calls, or example files.

lh3 commented 8 years ago

1) as I remember, freebayes --report-monomorphic does not conform to the VCF spec; 2) bgt is designed to work with polymorphic sites. Even if it worked with freebayes --report-monomorphic, it probably would not work well.

ml4wc commented 8 years ago

Thanks, Heng. Would you recommend just keeping these in BCF then?

I still get a segfault on the default (polymorphic) freebayes bcf in r282. It gets through the header and dies on the first record. Here is the gdb Segfault

Program received signal SIGSEGV, Segmentation fault. 0x0000000000406324 in update_loff (idx=idx@entry=0x6342b0, i=i@entry=0, free_lidx=1) at hts.c:277 277 kh_val(bidx, k).loff = kh_key(bidx, k) < idx->n_bins? lidx->offset[hts_bin_bot(kh_key(bidx, k), idx->n_lvls)] : 0;

backtrace

0 0x0000000000406324 in update_loff (idx=idx@entry=0x6342b0, i=i@entry=0, free_lidx=1) at hts.c:277

1 0x000000000040902e in hts_idx_finish (idx=idx@entry=0x6342b0, final_offset=) at hts.c:342

2 0x000000000041794f in bcf_index (fp=fp@entry=0x67a9b0, min_shift=min_shift@entry=14) at vcf.c:1023

3 0x0000000000417b27 in bcf_index_build (fn=fn@entry=0x631010 "40.reg.bgt.bcf", min_shift=min_shift@entry=14) at vcf.c:1033

4 0x00000000004025c3 in main_import (argc=3, argv=) at import.c:117

5 0x00007ffff72f9ec5 in __libc_start_main (main=0x401d40
, argc=4, argv=0x7fffffffe588, init=,

fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe578) at libc-start.c:287

6 0x0000000000401fac in _start ()

locals: bidx = 0x67a950 lidx = 0x67a980 k = 0 l =

offset0 =

lh3 commented 8 years ago

It seems that something added after r265 breaks bgt. Do you have a small example for me to try? Thank you.

ml4wc commented 8 years ago

Here is dummy zipped bcf that works with r265 and not r282

fake.zip

lh3 commented 8 years ago

Thanks. This is actually caused by the inconsistency between htslib and the bgt BCF parser. I have not figured out why r265 does not segfault, but anyway, the output of r265 is incorrect. The "CHROM" is missing in the r265 output; the "bcftools view" does not work, either.

The solution is to import as VCF files. The BCF spec has been changed a few times. It will take time to keep my BCF reader/writer up to date.

lh3 commented 8 years ago

Wait a moment. bgt-r282 has other problems. I am looking at that. Sorry.

lh3 commented 8 years ago

The import command of r282 does not work at all. See 7c66cc8 for the fix. However, the parser problem remains. To import your file, you need to:

bcftools view fake.bcf | grep -v ^##contig | bgt import -t ref.fa.fai prefix -

where ref.fa.fai is the faidx index of your reference genome in use.

I will close the issue and may come back to the parser issue later. Thank you very much for raising this issue.