petrelharp / ftprime_ms

4 stars 2 forks source link

add gist of RAM stuff #44

Closed ashander closed 6 years ago

ashander commented 6 years ago

todo after #43 : add something like

Unsimplified tree sequences grow quickly, and so storing history can use arbitrarily much memory. However, the amount of required memory is mostly determined by the interval between simplification steps, although as shown below, there is a speed--memory tradeoff: less frequent simplification reduces overall computation time. In fact, our method would in some situations reduce the amount of memory required, if memory usage in the forwards simulation was dominated by the cost of maintaining neutral genetic variants -- \emph{peak} memory usage (shown in the figures) might well be the same, since adding neutral variants to an existing tree sequence takes a similar amount of memory to maintaining neutral genotypes, but this additional usage happens after the fact (as a separate step), and for a much shorter time period.

petrelharp commented 6 years ago

@jeromekelleher wrote:

As Kevin has pointed out, arguing that forward simulators will need O(NM) memory for M variants is a bit of a straw man, since the good simulators are much more sophisticated than this. So, I'm not sure that we can argue that our approach can reduce the memory footprint. However, given that we're only a small constant larger than fwdpy11's neutral handling (which uses very little RAM), and we can fit very large simulations into < 50G of RAM, I think we're in a very good place memory-usage wise. This is the narrative I would emphasise. The 100X figure seems like an unnecessary distraction from this message.

would in some situations reduce the amount of memory required, if memory usage in the forwards simulation was dominated by the cost of maintaining neutral genetic variants -- \emph{peak} memory usage (shown in the figures) might well be the same, since adding neutral variants to an existing tree sequence takes a similar amount of memory to maintaining neutral genotypes, but this additional usage happens after the fact (as a separate step), and for a much shorter time period.

I don't think this is true. Adding infinite sites mutations to a tree sequence requires O(1) memory, as we only have to consider one edge at a time. Or am I misunderstanding your point here?

Oh, good point. I meant that the tree sequence with neutral mutations added would take a similar amount of memory to the representation stored by (a sophisticated) forward simulator. And, in practice when we add mutations, it all gets put in memory at once, even though that doesn't need to happen, right? Or is this not what you meant?

jeromekelleher commented 6 years ago

Oh, good point. I meant that the tree sequence with neutral mutations added would take a similar amount of memory to the representation stored by (a sophisticated) forward simulator. And, in practice when we add mutations, it all gets put in memory at once, even though that doesn't need to happen, right? Or is this not what you meant?

Ah, OK. Just need to clarify this point a bit then.

petrelharp commented 6 years ago

Edited. I've opted to leave out the details to avoid going in the weeds; but I wanted to make the point that the method does not inherently use more memory.

ashander commented 6 years ago

:+1: