vibansal / HapCUT2

software tools for haplotype assembly from sequence data
BSD 2-Clause "Simplified" License
207 stars 36 forks source link

Only a small part of SNPs can be phased #113

Closed JFF1594032292 closed 3 years ago

JFF1594032292 commented 3 years ago

Hi, Firstly, thanks for developing this tool. It's really helpful!

I used HapCUT2 to phase my 2826 heterozygous SNPs from Hi-C data. I called SNPs from Hi-C sequencing data by bcftools and removed the low quality variants, then ran HapCUT2 on the bam file which only contained all alignment reads on the variants (because the whole bam file was too large) However, the results showed that only 800 SNPs (~30%) can be phased which is much lower than I expected. Is that normal? I also noticed that the depth and QUAL of unphased SNPs are not low, which mostly DP>100 and QUAL>200, even higher. I wonder if I did something wrong in the steps and how can I improve it?

Thanks,

Jiang

vibansal commented 3 years ago

HapCUT2 uses long-distance linkages present in Hi-C data for phasing, therefore, limiting the phasing to a region could potentially reduce the phasing completeness.