vibansal / HapCUT2

software tools for haplotype assembly from sequence data
BSD 2-Clause "Simplified" License
207 stars 36 forks source link

HOMOZYGOUS_PRIOR too small #110

Closed jzhang-dev closed 3 years ago

jzhang-dev commented 3 years ago

Hi,

The default value for HOMOZYGOUS_PRIOR, as defined in optionparser.c, is -80 (log scale). As a result, the minimum coverage depth required for a variant to be called homozygous is nearly 100, which feels to stringent for typical use cases. Would it be possible to adjust HOMOZYGOUS_PRIOR to a higher value, or add a command line parameter for HOMOZYGOUS_PRIOR?

I also found this in the Supplemental Methods of (Edge et al., 2017):

HapCUT2 supports obtaining the prior probabilities of the unordered genotype configurations (00),(01),(11) from the VCF genotype likelihoods.

However, I failed to find relevant parameters in the documentation. It would be very useful if this could be supported in HapCUT2.

Thanks and best wishes,

Jia-Yuan Zhang

vibansal commented 3 years ago

Sorry for the long delay in replying to your request. It is not difficult to add this as a command line parameter. However, adding code for using the prior probabilities of the genotypes will require some more coding and testing. Would you be interested in contributing the change via a pull request? Thanks.

jzhang-dev commented 3 years ago

Thanks. I think I may be able to contribute by adding a command line parameter for HOMOZYGOUS_PRIOR later, while the second funtionality is probably beyond my capacity of C programming at the moment. Meanwhile, I realized that the core HapCUT2 algorithm implicitly relies upon the all-heterozygous assumption. Therefore, I think it is presumably best to remove variants that are most likely to be homozygous before running HapCUT2, if such information is available a priori.