vibansal / HapCUT2

software tools for haplotype assembly from sequence data
BSD 2-Clause "Simplified" License
205 stars 36 forks source link

extractHAIRS for polyploids #82

Open sinamajidian opened 5 years ago

sinamajidian commented 5 years ago

Dear Hapcut team As you may know, most of haplotype assembly algorithms for polyploids use fragment matrix format. It would be great if there is a descent program for extracting such file from BAM and VCF files. It seems that extractHAIRS is capable of generalization to polyploid case.

In a simple case which is of interest of developers, we can consider bi-allelic assumption. Then, we can filter out homozygous variants and second alternative alleles ("0/2"). Also, we can consider SNV only in VCF file and remove complex variants.

To implement that using your code, it seems that the only needed edit is to ignore error checking part of your code readvariant.c. I tried to that and uploaded it here. The differences are highlighted here I'm looking forward to hear your ideas. Regards, Sina.

vibansal commented 4 years ago

Sorry for the delay in responding to this. We would like to include this functionality in hapcut2, however, it would be important to consider 2nd or 3rd alternative alleles for polyploids for this to be complete.