HAHap is a method to infer haplotypes using sequencing data. It attempts to eliminate the influence of noises through the process of assembly, though it remains the spirit of minimum error correction in certain conditions. We developed an adjusted multinomial probabilistic metric for evaluating the reliability of a variant pair, and the derived scores guide the assembly process.
HAHap takes BAM files as the input, and was validated using the short reads from the Illumina HiSeq platform.
HAHap is a pure-python program. It requires the following packages.
Git clone and execute bin/HAHap.
git clone https://github.com/ifishlin/HAHap
cd HAHap/bin
python HAHap phase vcf bam out
usage: python HAHap phase [--mms MMS] [--lct LCT] [--minj MINJ] [--pl PL] VCF BAM OUT
positional arguments
VCF VCF file with heterozygous variants needed to be phased
BAM Read mapped file
OUT VCF file with predicted haplotype. (HP tags)
optional arguments:
--mms Minimum read mapping quality (default:0)
--lct Threshold of low-coverage pairs (int, default:median)
--minj Minimum junctions number (default:4)
--pl The likelihood of P1 and P2 (default:0.49)
The answer set used in the real-data experiment was created by taking the intersection between (1) and (2)
Yu-Yu Lin, Pei-Lung Chen, Yen-Jen Oyang and Chien-Yu Chen. National Taiwan University, Taiwan.