nygenome / Conpair

Concordance and contamination estimator for tumor–normal pairs
Other
56 stars 29 forks source link

bugfix: fixed caclulation of downsampling fraction for high-coverage … #24

Closed willhooper closed 1 month ago

willhooper commented 1 month ago

When Conpair is genotyping markers, it downsamples sites with coverage above 450X to make the likelihood math work. Conpair is written in python2, which always rounds down when dividing two integers. In this case, that meant that the downsampling fraction was always computed as 0. This code rarely, if ever, will get triggered for WGS, and probably won't make a huge difference for typical WES. The edge case that brought this to light was an exome sample with >1000X coverage.

When the downsampling fraction is set to 0, and 0 reads are used, each possible genotype has the same likelihood, and the population AF priors dictate which genotype is called. When enough sites are above the coverage threshold, this will make two samples look discordant when they're actually concordant.

The fix is to just force python to do float division.