Closed hyanwong closed 3 years ago
Cool, @hyanwong! I assume we will want a method like this in tskit at some point.
One thing I wanted to flag - adding new nodes may invalidate the mutation time requirements if the tree sequence has them, for example if a mutation is just above one of the polytomies to be broken then it would need to be moved above the oldest node in the new set that replaces the polytomy. Clearly an edge-case but thought I should flag!
That's a good point @benjeffery, thanks. Actually, I break the polytomy below the focal node, so that the polytomy node does not change time: it's mutations on any nodes below the polytomy which will need checking. I think I can see how to do this: at the moment I set the delta time (dt
) value to the smallest difference between the child node times and the parent. I need to also check the mutation times above each child.
What do we do if the time differences are so small that they start to encounter floating point accuracy errors. Or is this so unlikely that we don't care?
What do we do if the time differences are so small that they start to encounter floating point accuracy errors. Or is this so unlikely that we don't care?
I think we detect the situation and error out. Only other option is some horrible recursive shuffling right?
Yes, I guess so. The question is whether we actively look for this and raise a specific error, or just expect it to bomb out (with e.g. time[parent] <= time[child]) when we try to convert to a tree sequence.
Shall I actually work this up into a PR, if you think it's a useful extra method on a tree_sequence? I guess , following other examples, I could make it an in-place method on a TableCollection, and then create a idempotent (is that the right word?) version for a tree sequence.
Hi Yan Wong, This looks great! Personally, I would prefer to raise a specific error rather than time[parent]<=time[child] which is less informative from a user's perspective.
Haven't got a chance to try it out myself but I think this feature would be useful for many others. Wonder what other people in the tskit group think about this feature.
btw, how to access the tskit in which you can supply the include_terminal param to edge_diff? It's not in the latest release, right?
Best, Yilei
On Fri, Aug 28, 2020 at 4:25 AM Yan Wong notifications@github.com wrote:
Yes, I guess so. The question is whether we actively look for this and raise a specific error, or just expect it to bomb out (with e.g. time[parent] <= time[child]) when we try to convert to a tree sequence.
Shall I actually work this up into a PR, if you think it's a useful extra method on a tree_sequence? I guess , following other examples, I could make it an in-place method on a TableCollection, and then create a idempotent (is that the right work) version for a tree sequence.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tskit-dev/tskit/issues/809#issuecomment-682397658, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALNWTMEUUCFZHRW4FGYFWKLSC5SWNANCNFSM4QNL6YRA .
Hi @hyl317 - to try it out, until #787 is merged you'll need to install my branch directly, e.g.
python3 -m pip install git+https://github.com/hyanwong@edge_diff_include_terminal#subdirectory=python
Then try out the code.
ISWYM about the specific error, it's just a bit more work to do properly!
Hi @hyl317 and @awohns - you can now test this using a single install via the PR I just made. The name has changed (for the time being) to randomly_split_polytomies
:
python3 -m pip install git+https://github.com/hyanwong@random-split-polytomy#subdirectory=python
Simply call like
ts_binary = ts.randomly_split_polytomies(random_seed=1)
I'd be happy to help review code for this.
Great, thanks @brianzhang01 - that would be really useful. The PR is at https://github.com/tskit-dev/tskit/pull/815 but I don't know if we want to implement out own PRNG, so that we can have an equivalent C function.
Just to note that the opposite: collapse edges into polytomies, only retaining those supported by mutations, is at https://github.com/tskit-dev/tskit/discussions/2926
I've finally got some working code to randomly resolve polytomies in a tree sequence. It seems to work for the admittedly small sample of inferred trees that I have tried. It tries to be clever by not resolving per tree, but per polytomy (hence if an identical polytomy spans several trees, it will only resolve the polytomy once, creating fewer edges than a per-tree approch, which should mean it scales to larger sample sizes better). I'm posting it here so that:
a. I don't lose it (!) b. We can decide if we want something like this in the base
tskit
library, as a method on a tree sequence. c. @hyl317 can use it if he wants, for ARGweaver compatibility (although his current solution may be faster)Obviously it needs a fair bit of tidying up. It's also slow, but there's also considerable scope for optimization, I think. It requires the PR I made at #787
The code can be tested using something like this: