jeromekelleher / sc2ts

Infer a succinct tree sequence from SARS-COV-2 variation data
MIT License
4 stars 3 forks source link

Get the mutational spectrum of a specified node and its descendants #148

Open szhan opened 1 year ago

szhan commented 1 year ago

Bloom et al. (2023) MBE examined clade-specific mutational spectra. In Figure 1, they compared VoC-specific mutational spectra and noticed changes in mutational spectra (e.g., Omicron versus early clades).

It should be useful to extend the get_mutation_spectrum function in sc2ts to get the mutation spectrum of a clade under a specified ancestral node. We may also want to allow for including and/or excluding nested clades under nodes with certain metadata (e.g., GISAID clade or Pango lineage).

szhan commented 1 year ago

Maybe something like this? Am I thinking about the descendants across trees correctly?

def get_mutation_spectrum(
    self,
    min_inheritors=1,
    focal_node=None,
    exclude_nodes=None,
):
    """
    Returns a counter of mutations in the ARG, representing a mutation spectrum.

    :param min_inheritors: Minimum number of inheritors of a mutation.
    :param focal_node: If not None, only mutations in the subtree of this node are considered.
    :param exclude_nodes: If not None, mutations in the subtree of these nodes are excluded.
    """
    if focal_node is not None:
        # Get the descendants of this node across trees.
        pass
    if exclude_nodes is not None:
        # Get the descendants of these nodes across trees.
        pass
    keep = self.mutations_num_inheritors >= min_inheritors
    # Keep mutations ocurring on the descendants of the focal node, but not on the focal node itself.
    # Exclude mutations ocurring on the descendants of the exclude nodes, but not on the exclude nodes themselves.
    inherited = self.mutations_inherited_state[keep]
    derived = self.mutations_derived_state[keep]
    sep = inherited.copy()
    sep[:] = ">"
    x = np.char.add(inherited, sep)
    x = np.char.add(x, derived)
    return collections.Counter(x)
jeromekelleher commented 1 year ago

Good idea, that should be doable. I guess the straightforward way to do this for a single node is to iterate over the trees, and then look at all mutations at sites on that tree, and keep them if they have the node in question as an ancestor.

This will work fine for the application here I think, since we only care about a few important nodes. keeping the mutation spectrum for all nodes is harder, but doable I think