zeeev / vcflib

a simple C++ library for parsing and manipulating VCF files, + many command-line utilities
https://github.com/ekg/vcflib#vcflib
MIT License
19 stars 6 forks source link

gl-xpehh specifying regions #1

Closed alextjc closed 10 years ago

alextjc commented 10 years ago

Hi developers,

When I try specifying a large region (generally anything over 2Mb) I am getting an error message (see below). Not sure what is causing this. It works for smaller region sizes so know that the region is valid. Thanks for your help

bin/gl-XPEHH --target 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19 --background 20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39 --file chr20genomedata.vcf.gz --region chr20 INFO: there are 20 individuals in the target INFO: target ids: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19 INFO: there are 20 individuals in the background INFO: background ids: 20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39 INFO: file: chr20genomedata.vcf.gz INFO: set seqid region to : chr20 INFO: there are 50 individuals in the VCF terminate called after throwing an instance of 'std::out_of_range' what(): basic_string::substr Aborted (core dumped)

zeeev commented 10 years ago

Thanks for letting me know. Is there anyway you could send me a test case? A VCF file with a header and a region that causes the error? I will fix it today.

zeeev commented 10 years ago

Bug replicated and fixed. Phasing was breaking down in regions of high haplotype diversity. "localPhase" couldn't generate an EHH score greater than zero so no haplotypes were assigned to the global data structure.

This justified a version change from 1.0.0 -> 1.0.1

Thanks much for the report: https://github.com/alextjc