Closed hyanwong closed 1 week ago
Here's the viz run on a random covid pangolin lineage:
Once we have defined a postorder_minlex_tracked_node_traversal
, this is produced using e.g.
pango = "B.1.1.70"
tracked_nodes = ti.pango_lineage_samples[pango]
tree = ts.first(tracked_samples=tracked_nodes)
order = list(postorder_minlex_tracked_node_traversal(tree, collapse_tracked=False))
print(len(order), f"nodes in subtree. Nodes in magenta are {pango}")
tree.draw_svg(
time_scale="rank",
order=order,
size=(1000, 800),
node_labels={u: ts.node(u).metadata.get("Viridian_pangolin", "") for u in order if u not in tracked_nodes},
mutation_labels={},
symbol_size=4,
summarise_untracked_polytomies=True,
style=(
"".join(f".n{u} > .sym {{fill: magenta}}" for u in tracked_nodes + [39]) +
".lab.summary {font-size: 9px}" +
".polytomy {font-size: 10px}"
),
)
And here's a path to a pango lineage represented by a single sample:
Looks great Yan!
Looks great Yan!
Great, thanks. I'll work it into a PR.
The main issue to which there is no easy solution is when we have a huge polytomy of (say) 1000 lineages, 999 of which are lineages containing entirely (or mostly) focal (tracked) samples, and one of which is not. We can't visually collapse parts of such a polytomy in an meaningful way: either we collapse the whole thing, or we have to show all the focal lineages, as we don't know how they relate to each other. For example, here's the top of the B.1.1.7 (alpha) lineage from a covid tree,
I think this is an insoluble issue, so I'm happy to punt it down the line.
Here's a quick demo of the visual scheme I have come up with for condensing trees with polytomies, so we show only the lineages relating to a set of tracked samples (tips in cyan). Such samples might represent (say) a geographical region, or a covid Pango lineage. Here's an example, followed by the suggested scheme:
Condensed:
Two things are going on here:
Optionally (3rd plot), we can also collapse nodes that consist of entirely tracked samples (here node 39) into a triangle/trapezium:
Does this look like a reasonable approach? I'm not sold on the "+n/m" notation but it was the most succinct/consistent that I could come up with.