tskit-dev / tsconvert

Utilities for converting tree sequences to and from other formats
MIT License
7 stars 6 forks source link

networkx.DiGraph import #12

Open winni2k opened 4 years ago

winni2k commented 4 years ago

I would like to store a sequence of trees derived from a Metropolis-Hastings MCMC sampler. This sampler stores its current state as a networkx.DiGraph object, and keeps track of how it changes the graph using a memento.

I propose the following features:

  1. a function that converts a sequence of DiGraphs into a tree sequence
  2. a class that can create a tree sequence from an initial DiGraph object and a sequence of mementos that describe successive changes to the DiGraph object, where each memento represents a sample from the MCMC chain.

In principle, one could then implement an import of arbitrary sequences of trees simply by mapping each tree in the import data to a DiGraph, and feeding those DiGraph objects into 1. I think that would reduce the amount of code needed for tree import as seen in for example #11 because it separates tree parsing from tree sequence creation.

However, my real interest is in 2., which should be more efficient than 1. because 2. would be able to easily infer what information is not redundant and therefore worth storing for each successive tree in the sequence. This would allow the construction of large tree sequences while using minimal amounts of memory and without the need for intermittent sort or simplify calls.

I welcome feedback on this proposal.

jeromekelleher commented 4 years ago

Sounds good to me @winni2k! I'm happy to look over any prototypes if you want to open a draft PR. There's no worries about dependencies etc in this repo, so just do whatever is most convenient. I'm interested to see how your mementos work.

winni2k commented 4 years ago

Right-o.

It might be a bit of time until I get around to this, as I'm still working on another aspect of my research.