Closed szhan closed 1 year ago
I'm comparing the parent node ids of the paths obtained under different parameters:
These histograms show the number of differences in parent node ids for each sample. These results show that we don't get the same results despite identical MMR.
The x-axis label is wrong above. It should be "Number of parent node id differences".
The MMR isn't simply equal to rho / mu. So, I've thinking about this wrong. Using the parameter combinations above, one shouldn't expect getting the same paths even though rho / mu is maintained.
h/t Duncan Palmer. rho and mu should be defined as follows.
import math
def compute_rho(mu, n, k):
a = n * mu**k
b = (1 - mu)**k + (n - 1) * mu**k
rho = a / b
return rho
def compute_k(mu, rho, n):
a = math.log((rho/n) / (1 - rho + rho/n))
b = math.log(mu / (1 - mu))
k = a / b
return k
I did another run setting mu to 1e-07 and 1e-08 and letting rho to be defined by the function above, setting k to 1 (which is MMR) and n to 1,400 (which is the size of the reference sample set used in the previous runs). Some paths are different still, but less so.
Also, see #21
Results of the experiments so far are consistent with numerical instability in the algorithm.
Next step is to compare sample paths obtained using tskit.lshmm
and Duncan's lshmm
.
Related to #18