jeromekelleher / sc2ts

Infer a succinct tree sequence from SARS-COV-2 variation data
MIT License
4 stars 3 forks source link

XB parent copying unparsimonious #154

Open hyanwong opened 1 year ago

hyanwong commented 1 year ago

It is probably worth collecting instances where we think that we have got recombinations wrong for various reasons. Here's one, which involves recombination within the XB lineage, between a grandchild and a grandparent:

import tszip
import sc2ts
import matplotlib.pyplot as plt

ts = tszip.decompress(f"../data/upgma-full-md-30-mm-3-2021-06-30-recinfo-gisaid-il.ts.tsz")  # wide ARG

fig, ax = plt.subplots(1, 1, figsize=(6, 8))
sc2ts.plot_subgraph(
    [964312, 460829, 1050694, 1088449, 400905, 803973, 1088538, 1050758],
    ts, 
    mutations_json_filepath=None,
    exterior_edge_len=0.1,
    ax=ax,
    ts_id_labels=True,
    node_metadata_labels="Imputed_Nextclade_pango",  # can easily change to "Imputed_GISAID_lineage"
    sample_metadata_labels=None,
    node_size=800,
    label_replace={"Unknown":"", "Unknown ":"", },
    node_colours={
        'XB': '#fc31fb',
        "Unknown (R)": "lightgrey",
        "Unknown": "w",
    },
    colour_metadata_key="Imputed_Nextclade_pango"
)
plt.show()

image

jeromekelleher commented 1 year ago

I don't think it's a matching logic error, more likely a result of the reversion handling leading to something unparsimonious?

jeromekelleher commented 1 year ago

t1471c probably isn't real

hyanwong commented 1 year ago

I don't think it's a matching logic error, more likely a result of the reversion handling leading to something unparsimonious?

Yes, sorry, that was a poor terminology choice by me. I meant: this is probably not a real recombination, so something odd is happening when coupling matching with daily tree resolution.

Either way, it's a useful example to dig into to look for algorithmic improvements.