Closed hyanwong closed 2 weeks ago
Starting with the simplest one, these are for the ARG from KwARG with y_axis_scale="time"
and y_axis_scale="rank"
. For the first, I didn't move the nodes at all. I would say that the ordering of the tips is better in your version. The second one, I've tried to untangle as much as possible. I'm not sure why there are unary nodes (13, 20, 17, 16, etc) in this ARG, but haven't worked with the method before.
Hmm, perhaps we are creating a 2-RE-node ARG when converting from kwARG, but not setting the flags that you expect. I.e. it may be that 35 & 36 represent a single "recombination node".
Edit - yes, that is the problem: https://github.com/tskit-dev/what-is-an-arg-paper/blob/4ef243b086a715ad7c26aeeaec1783f189766f07/argutils/convert.py#L145 - see https://github.com/tskit-dev/what-is-an-arg-paper/issues/325
Wouldn't we then expect nodes 35 and 36 to have the same time? I have node 35 at time=387592 and node 36 at time=144455.
Good point. Yes we would. I'm not sure why this happens then. I will investigate.
Ana points out that kwARG doesn't infer times, so these are somewhat arbitrary. The reason that we have 2 re nodes at different times is that to get semi-reasonable times out of the ARG, we ran the "tsdate" program on the ARG output from kwARG, and tsdate
doesn't know to keep the 2 recombination nodes at the same time.
The simplest way around this would be to convert the kwARG output to a 1-RE-node format and keep the mapping of old -> new node IDs. We could then date the 1-re-node version, and port the times back.
This would be useful for redating "full" ARGs from whatever source. The 2-RE-node version is needed to calculate likelihoods under the CwR.
Apologies - it looks like I only ran the tsdate algorithm on my local copy. The main repo has this kwARG inference without explicitly dated nodes (just in rank order). I have updated the repo placing the correct node flags at https://github.com/tskit-dev/what-is-an-arg-paper/pull/332. A copy of the Kreitman-kwARG inference is below, and it gives a much nicer picture: Kreitman_SNP_kwarg.trees.zip
This is about the best I can come up with (3 crossings):
Here's some starting points for visualizations of the different methods: kreitman_jsons.zip
The first and last trees from the tsinfer trees appear to be empty in the .trees file that I was using; I don't know if that was intentional or not. Also for most of the them, we are stuck using edge_type="line"
until I work out the pathing for polytomies. I've added that to the milestone TODO list for v0.0.2. You can almost get away with "ortho" for argweaver, but as the D3ARG object is expecting 2-RE-format when using D3ARG.from_ts()
, it doesn't quite work.
Let me know if you have trouble running any of the JSONs!
Thanks. We should have a 2 <-> 1 conversion routine, I guess (difficult for diamonds, though).
The tsinfer
post-processing step deliberately cuts off the topology before the first site and after the last site, which is why there are empty regions at the start and end of most tsinferred tree sequences.
I think we are done here
In our ARG paper, we show a number of ARGs inferred from the same classic D. melanogaster dataset (by Kreitman):
It would be great to see how these can be displayed by the ts_arg_visualiser. I've zipped up the respective tree sequences below:
Kreitman_SNP_inferred_ARGs.zip