calculate_haplotype_statistics.py slightly differs from block headers

Hi @vibansal, I have a question about the calculate_haplotype_statistics.py script. I noticed that the phased count and num snps max blk reported by the script are different from those in BLOCK headers of my .hap file I use. For instance, if I sum the total number of phased SNVs and check the number of SNVs in the largest block in .hap file, I get slightly different counts as compared to the script output.

If I sum the phased field for all blocks I get the following number: 189701. My largest block header is as following:

BLOCK: offset: 12 len: 189252 phased: 188348 SPAN: 248704444 fragments 663113

However, the output from calculate_haplotype_statistics.py gives the following numbers with -i on:

phased count: 188484 num snps max blk: 188057

I wonder if there is some kind of filter implemented in the script that causes this?

Best, Mikhail

vibansal / HapCUT2

calculate_haplotype_statistics.py slightly differs from block headers #124