jeromekelleher / sc2ts

Infer a succinct tree sequence from SARS-COV-2 variation data
MIT License
4 stars 3 forks source link

Lineage imputation and sample subgraph plots #66

Closed a-ignatieva closed 1 year ago

a-ignatieva commented 1 year ago

Code for imputing the Pango lineage of nodes in an inferred tree sequence. This uses a a 2-step procedure of (1) copying the parent’s or a child’s lineage whenever possible (if the connecting edge doesn’t have any lineage-defining mutations on it), and (2) imputing the lineage of a node based on that of its parent + the lineage-defining mutations on the connecting edge. These steps are run iteratively over the ts until everything has a lineage assigned. WIP, and I need to add some tests here.

Also function in utils for visualising the history of a given sample node (tracing up through the ARG until other sample nodes are reached, showing any recombinations on the way).

jeromekelleher commented 1 year ago

Looks great, thanks @a-ignatieva!

A few minor things:

Other than that I think we can merge and iterate - although I guess we'll have to rebase and squash the commits in order to get rid of hte big JSON file completely from the history.

a-ignatieva commented 1 year ago

Doing these now! I can delete the existence of the json file from git history using git-filter-repo, but I think I have to create a new duplicate branch for this (so I'll need to open a different PR). It'll keep the rest of the commits unharmed though.

Update: this doesn't work! I guess rebasing and squashing is the way to go.

jeromekelleher commented 1 year ago

This guide for squashing might come in handy

a-ignatieva commented 1 year ago

Sorted I think.

jeromekelleher commented 1 year ago

Merged, thanks @a-ignatieva!