Open szhan opened 1 year ago
Maybe something like this? Am I thinking about the descendants across trees correctly?
def get_mutation_spectrum(
self,
min_inheritors=1,
focal_node=None,
exclude_nodes=None,
):
"""
Returns a counter of mutations in the ARG, representing a mutation spectrum.
:param min_inheritors: Minimum number of inheritors of a mutation.
:param focal_node: If not None, only mutations in the subtree of this node are considered.
:param exclude_nodes: If not None, mutations in the subtree of these nodes are excluded.
"""
if focal_node is not None:
# Get the descendants of this node across trees.
pass
if exclude_nodes is not None:
# Get the descendants of these nodes across trees.
pass
keep = self.mutations_num_inheritors >= min_inheritors
# Keep mutations ocurring on the descendants of the focal node, but not on the focal node itself.
# Exclude mutations ocurring on the descendants of the exclude nodes, but not on the exclude nodes themselves.
inherited = self.mutations_inherited_state[keep]
derived = self.mutations_derived_state[keep]
sep = inherited.copy()
sep[:] = ">"
x = np.char.add(inherited, sep)
x = np.char.add(x, derived)
return collections.Counter(x)
Good idea, that should be doable. I guess the straightforward way to do this for a single node is to iterate over the trees, and then look at all mutations at sites on that tree, and keep them if they have the node in question as an ancestor.
This will work fine for the application here I think, since we only care about a few important nodes. keeping the mutation spectrum for all nodes is harder, but doable I think
Bloom et al. (2023) MBE examined clade-specific mutational spectra. In Figure 1, they compared VoC-specific mutational spectra and noticed changes in mutational spectra (e.g., Omicron versus early clades).
It should be useful to extend the
get_mutation_spectrum
function insc2ts
to get the mutation spectrum of a clade under a specified ancestral node. We may also want to allow for including and/or excluding nested clades under nodes with certain metadata (e.g., GISAID clade or Pango lineage).