Open hyanwong opened 10 months ago
Added the changes you mentioned to the tskt_arg_visualizer 0.0.2 milestone and should be pretty straightforward to implement!
I personally like the node labels when mapping between the trees and the ARG. Without the nodes, it might be a bit difficult for newcomers to grasp how (and why) the trees are woven together. Something like this paragraph
A major benefit of “tree sequence thinking” is the close relationship between the tree sequence and the underlying biological processes that produced the genetic sequences in the first place, such as those pictured in the demography above. For example, each branch point (or “internal node”) in one of our trees can be imagined as a genome which existed at a specific time in the past, and which is a “most recent common ancestor” (MRCA) of the descendant genomes at that position on the chromosome. We can mark these extra “ancestral genomes” on our tree diagrams, distinguishing them from the sampled genomes (a to j) by using circular symbols.
from lower on the page seems critical to understanding why the trees are correlated, including the fact that specific nodes/edges are found across multiple trees. The tree highlighting and variable edge width within the ARG helps to show this correlation but doesn't include the biological reasoning why. Maybe we move that paragraph up above this figure?
With the latest commit to the visualizer, users can now control the size and symbol of the nodes. Here's your example from above with smaller nodes and square sample nodes.
import msprime
import demes
import tskit_arg_visualizer as viz
def whatis_example():
demes_yml = """\
description:
Asymmetric migration between two extant demes.
time_units: generations
defaults:
epoch:
start_size: 5000
demes:
- name: Ancestral_population
epochs:
- end_time: 1000
- name: A
ancestors: [Ancestral_population]
- name: B
ancestors: [Ancestral_population]
epochs:
- start_size: 2000
end_time: 500
- start_size: 400
end_size: 10000
migrations:
- source: A
dest: B
rate: 1e-4
"""
graph = demes.loads(demes_yml)
demography = msprime.Demography.from_demes(graph)
# Choose seed so num_trees=3, tips are in same order,
# first 2 trees are topologically different, and all trees have the same root
seed = 12581
ts = msprime.sim_ancestry(
samples={"A": 2, "B": 3},
demography=demography,
recombination_rate=1e-8,
sequence_length=1000,
random_seed=seed)
# Mutate
# Choose seed to give 12 muts, last one above node 14
seed = 1476
return msprime.sim_mutations(ts, rate=1e-7, random_seed=seed)
ts = whatis_example()
arg = viz.D3ARG.from_ts(ts=ts)
labels = {}
for node in arg.nodes:
if node["flag"]==1:
labels[node["id"]] = node["label"]
else:
labels[node["id"]] = ""
arg.set_node_labels(labels=labels)
arg.draw(
variable_edge_width=True,
y_axis_scale="time",
node_size=50,
sample_node_symbol="d3.symbolSquare",
sample_order=[0,2,3,4,8,9,5,6,7,1]
)
To encourage people to think of tree sequences as graph objects, I think it would be helpful to add the graph representation to the "What is a Tree Sequence" tutorial, round about here. This is how you might do it:
Currently this gives a plot like this:
I think a few things would be helpful to make this look simpler. In particular, if we could change the node sizes & shapes such that the internal nodes are (very) small circles and the sample nodes are square, that would match the tree-by-tree plot above it (https://github.com/kitchensjn/tskit_arg_visualizer/issues/30). Allowing the y-axis ticks to be set to user-chosen values would also be helpful, I think.
Perhaps @kitchensjn has some ideas about how to make the plot friendly to a newcomer in this context?
Note that
ts
has been produced by code in the nodebook, like that below: