Closed benjaminwilson closed 5 years ago
Hi Benjamin,
Thank you for bringing this to our attention. The issue is due to the fact that, while running the processing, set
objects are used to track the pool of extinct and extant tip nodes. As you know, order is not maintained by these containers. Nodes are popped off these as required, but this means that while the process itself is identical under the same random seed as advertised, the specific nodes that get selected for, e.g., birth events vary due to the container semantics.
The solution to this is to adopt a list. But this is going to have a performance hit in a couple of places.
Ok, maybe give a06f95e2 a spin. I've changed the node pool containers to lists, and, with the order invariance, the RNG seed dependent determinism now holds. Not sure what the performance penalties are at this point, though.
Thanks for the quick fix! I can confirm that this resolves the indeterminancy. A quick benchmark seems to indicate no performance penalty:
%timeit t = treesim.birth_death_tree(birth_rate=1.0, death_rate=0.5, num_total_tips=100, is_retain_extinct_tips=True)
Before:
9.6 ms ± 34.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
At a06f95e
:
9.34 ms ± 51.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Great. I've merged this into the development-master
branch.
Multiple calls to
treesim.birth_death_tree
passing instances ofrandom.Random
with the same seed yield multiple distinct trees. Here is code to reproduce:Which yields:
The output shows that passing the RNG does make a difference ... but not enough of one :)
This is likely related to #88 (perhaps the example there just needs to be run multiple times to produce the issue).