jeromekelleher / sc2ts

Infer a succinct tree sequence from SARS-COV-2 variation data
MIT License
4 stars 3 forks source link

add export_recombination_node_breakpoints #118

Closed hyanwong closed 1 year ago

hyanwong commented 1 year ago

Replaces #116, as discussed. Returns one row for each breakpoint. We can easily subset down to rows with only a single breakpoint (i.e. 2 parents) by

df.drop_duplicates('node', keep=False)

or average the metrics across all the breakpoints in a node by

df.groupby('node').mean()

So I think this is a nice and flexible output.

hyanwong commented 1 year ago

I did the reuse-the-tree thing, which is much faster, thanks for the hint @jeromekelleher - we should have a "tips" tutorial for things like this. But it turned out to be just as easy to make two independent tree objects at the start and seek each one to adjacent positions, rather than use one tree and keep all the parent node IDs for comparison in a list. I think this is fine.

hyanwong commented 1 year ago

Here's an example of the output

Screenshot 2023-03-07 at 11 48 51