matsengrp / gctree

GCtree: phylogenetic inference of genotype-collapsed trees
https://matsengrp.github.io/gctree
GNU General Public License v3.0
16 stars 2 forks source link

Make gctree deterministic when ranking is degenerate #100

Open willdumm opened 2 years ago

willdumm commented 2 years ago

Adds a --seed argument to the inference cli, and fixes an issue where inference results could be nondeterministic with degenerate ranking criteria.

willdumm commented 2 years ago

This PR requires historydag v1.0.1, which isn't released yet, but will include a fix to make dag node orderings consistent across runs and after copy. https://github.com/matsengrp/historydag/pull/22

I'm still seeing some strange behavior that I don't think I'll be able to explain: If I install the fixed version of historydag, and install stock gctree v4.0.4, that fixes the issue where internal node names differ between runs.

If I then install the same gctree code with pip install -e (after doing git checkout v4.0.4), internal node names are still consistent between runs, but different than with stock gctree v4.0.4 from PyPI. Why, I don't know.

What I am confident in though, is that the changes in the above-referenced historydag PR make internal node names consistent between runs. The changes in this PR