Closed szhan closed 3 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 89.63%. Comparing base (
d3c59ba
) to head (898c480
).
Looks like some tests are failing with a new error:
248
if not np.all(num_alleles > 1):
249
err_msg = "Some sites have less than two distinct alleles."
250
> raise ValueError(err_msg)
251
E ValueError: Some sites have less than two distinct alleles.
The requirements that the first value of recomb. prob. array being zero and all sites in the input data must be variable are removed. See pre-release 0.0.8.
I suspect that the failing tests are caused by not excluding MISSING
when counting the number of alleles to scale mutation rates, because all the tests in test_haplotype_matching.py
pass when not including the queries with MISSING
values.
I think we can skip the failing diploid - I'm not sure how soon the diploid model will actually be implemented, so let's not worry about it. Just mark the failing tests as skip.
I'm trying to understand how the emission probability is set in the tree-based implementations. In lshmm
, when the query allele is MISSING
, the emission probability is set to 1.0
instead of whatever the value is when the query and ref. alleles are equal. I think this is causing the failed tests.
I don't know either to be honest. Let's not get bogged down in it if it's just on the diploid side, there's no corresponding C code (and no plan to implement)
I've updated how the emission probability is set when the query allele is MISSING
. All the haplotype matching tests are now passing.
I suspect the tests for the diploid case are failing because the fact that number of alleles can vary across sites is not being accounted for when computing the emission probability matrix. It should be fine if all the sites were biallelic, but some sites are probably invariant from the simulations.
Description
Update tests and requirements to use lshmm 0.0.7. Fixes #2962
PR Checklist: