tskit-dev / tskit

Population-scale genomics
MIT License
147 stars 69 forks source link

Fixup tests for lshmm 0.0.6 #2962

Closed jeromekelleher closed 1 week ago

jeromekelleher commented 2 weeks ago

Recent lshmm release breaks some tests in tskit test suite: https://github.com/astheeggeggs/lshmm/releases/tag/v0.0.6

Would you mind look at this please @szhan

jeromekelleher commented 2 weeks ago

Note we'll need to update the pins in requirements

szhan commented 2 weeks ago

When doing pytest -vs test_haplotype_matching.py, I encountered a bunch of failed tests that hit value error.

ValueError: First value in the recombination probability array must be zero.

I think it is caused by a line assigning recombination probabilities in, for example:

def check_viterbi(ts, h, recombination=None, mutation=None):
    h = np.array(h).astype(np.int8)
    m = ts.num_sites
    assert len(h) == m
    if recombination is None:
        recombination = np.zeros(ts.num_sites) + 1e-9 # First value is positive.

Similarly, check_forward_matrix and check_backward_matrix.

szhan commented 2 weeks ago

I'm not sure if this is intended, but mutation probability is set to 0 across all sites by default here.

def check_viterbi(ts, h, recombination=None, mutation=None):
    h = np.array(h).astype(np.int8)
    m = ts.num_sites
    assert len(h) == m
    if recombination is None:
        recombination = np.zeros(ts.num_sites) + 1e-9
    if mutation is None:
        mutation = np.zeros(ts.num_sites)

Similarly, check_forward_matrix and check_backward_matrix.

szhan commented 2 weeks ago

I also have some other failed tests. Should we investigate these failed tests in separate issues?

szhan commented 2 weeks ago

I've added stricter requirements on the input data, e.g., having all variable sites in the ref. panel and query combined. It is causing some tests to fail, I think.

szhan commented 2 weeks ago

I'm encountering these errors when running the tests locally.

szhan commented 1 week ago

Note that we are now using lshmm 0.0.8.