petrelharp / ftprime_ms

4 stars 2 forks source link

Key results missing from abstract #47

Closed jeromekelleher closed 6 years ago

jeromekelleher commented 6 years ago

Points I think we should add:

  1. We prove a lower bound space complexity of O(N log N) under some reasonable assumptions about recombination rate, etc. This also implies that we can store the results of any forward time simulation in O(N log N + M) space, which is far better than O(n M) that we have to do if we write out all the genotypes for n samples. So, we no longer have to store a small sample from our large simulation, we just store the whole population and analyse it afterwards. This is a powerful result which we should emphasise!

  2. The fact that we can easily seed a forwards-time simulation with a more efficient simulation of the deep history (using e.g. msprime). This is important because we then don't have to simulation 10N generations, but can just simulate the last (say) hundred generations if that's the period over which we think selection on specific loci is important, or that the time-scale over which we think coalescent assumptions are violated.

  3. We introduce the msprime Tables API, which allows for the efficient interchange of genealogical data between program components. This enables modularity of the simulations (as discussed in point 2).

petrelharp commented 6 years ago

Very good points. I've added something along these lines.

jeromekelleher commented 6 years ago

Perfect!

petrelharp commented 6 years ago

good call; although the point about other uses of storing correlated trees should probably go in there too

jeromekelleher commented 6 years ago

True enough. If we get any real results we should also add this in.