jeromekelleher / sc2ts

Infer a succinct tree sequence from SARS-COV-2 variation data
MIT License
4 stars 3 forks source link

Limiting node expansion when getting a sample subgraph #145

Closed szhan closed 1 year ago

szhan commented 1 year ago

When getting the sample subgraph of the causal recombinant node of XA in the wide ARG (id=296560) with expand_down=True, over a thousand nodes are added to the subgraph. @hyanwong and I have been discussing about whether to avoiding node expansion when it is probably not useful. In the case of XA in the wide ARG, it doesn't seem to be insightful to expand down from the ancestral non-sample nodes along the right path. One suggestion is to not expand down from ancestral nodes of a recombination node, e.g., the ancestral non-sample nodes on the right path in the case of XA.

image (1)

Even when no recombination nodes are encountered while expanding, thousands of nodes can still be added to the subgraph (say, if one wants to explore a non-reticulate part of the ARG). It may also be useful to limit the levels of expansion up to N, so as to avoid visualizing very large subgraphs.

hyanwong commented 1 year ago

You can do this with the new routines node_path_to_samples, and plot_subgraph:

import numpy as np
up_nodes = sc2ts.node_path_to_samples([296560], ts, stop_at_recombination=True)
nodes = sc2ts.node_path_to_samples(up_nodes, ts, rootwards=False, stop_at_recombination=True)
# Add parents of recombinants, but ignore any that have already reached samples
nodes = np.concatenate((nodes, sc2ts.node_path_to_samples(nodes, ts, ignore_initial=False, stop_at_recombination=True)))
# Add the stuff above the recombination nodes
nodes = np.concatenate((nodes, sc2ts.node_path_to_samples(nodes, ts, stop_at_recombination=False)))
nodes = np.unique(nodes)
print(len(nodes), "nodes to plot")
sc2ts.plot_subgraph(nodes, ts, treeinfo, exterior_edge_len=0)  # See https://github.com/jeromekelleher/sc2ts/issues/149

image

szhan commented 1 year ago

Thanks, Yan! We can separately address the second comment about subsetting the nodes to visualise if the subgraph gets too way big.