petrelharp / ftprime_ms

4 stars 2 forks source link

Simu pop round2 #38

Closed molpopgen closed 6 years ago

molpopgen commented 6 years ago

Includes all the raw data for sim benchmarks + the results in the manuscript.

I added an SI section for the sims w/o selection. There's no real story there, but we may as well show it.

ashander commented 6 years ago

Oops I see Peter has gone ahead and merged this. I guess I'll add any further comments as an issue or PR

petrelharp commented 6 years ago

I think you can still comment here, no?

ashander commented 6 years ago

I think you can still comment here, no?

er, good point. will do

molpopgen commented 6 years ago

Our weak selection regime does wacko things to allele ages, which affect the total run times in funny ways, esp. when you have the neutral variants in there. That probably explains it.

On Fri, Dec 22, 2017 at 11:33 AM Peter Ralph notifications@github.com wrote:

@petrelharp commented on this pull request.

In forwards_paper.tex https://github.com/petrelharp/ftprime_ms/pull/38#discussion_r158546730:

+dramatically improves run times for both \fwdpy{} and \simupop{} (right column of Figure~\ref{fig:runtimes_selection}). For parameters that ran to completion when tracking neutral mutations and when tracking +the ARG, the relative speedup due to ARG tracking is up to $\approx 50$ fold (Figure~\ref{fig:relative_speedup_selection}). For \fwdpy{}, ARG tracking reduces runtimes more for larger population sizes +(Figure~\ref{fig:relative_speedup_selection}). We could not make such an observation with \simupop{}, which tended to +run out of memory on our system. Note the log scale of the x axis in Figure~\ref{fig:runtimes_selection}, which +partially obscures an important observation that the run times appear approximately linear in region size when ARG +tracking. The reason for this behavior is that the simulations are generating a constant number of new edges on average +each generation, and the number of edges is a function of $\rho$ in expectation. + +We also performed a more limited set of simulations without natural selection. The total run times are shown in +Figure~\ref{sfig:rawspeed_nosel} and show the same qualitative behavior as simulations with selection +(Figure~\ref{fig:runtimes_selection}). The relative improvement due to ARG tracking is again substantial in these +simulations (Figure~\ref{sfig:speedup_nosel}). In fact, we see more of a benefit to ARG tracking here with \fwdpy11{} +than we did with selection (Figure~\ref{fig:relative_speedup_selection}). The reason is that the fitness function exits +close to instantly in simulations without selection that are based on \fwdpp{}, meaning that \fwdpy{} is doing little +more than generating random numbers and book-keeping in Figures~\ref{sfig:rawspeed_nosel} +and~\ref{sfig:speedup_nosel}.

I get it that the absolute time should be very different for with and without selection, but I thought that the speedup, i.e., the relative no-ARG/ARG time should be much larger without selection?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/petrelharp/ftprime_ms/pull/38#discussion_r158546730, or mute the thread https://github.com/notifications/unsubscribe-auth/AGHnH8xScIhG5hnGLErAm74w82h47uDPks5tDAJQgaJpZM4RLTxI .

--

Kevin Thornton

Associate Professor

Ecology and Evolutionary Biology

UC Irvine

http://www.molpopgen.org

http://github.com/ThorntonLab

http://github.com/molpopgen

molpopgen commented 6 years ago

That's doable.

On Fri, Dec 22, 2017 at 12:40 PM Peter Ralph notifications@github.com wrote:

@petrelharp commented on this pull request.

In forwards_paper.tex https://github.com/petrelharp/ftprime_ms/pull/38#discussion_r158555050:

@@ -240,10 +241,50 @@ \subsection*{Simulation benchmarks}

\plr{Insert figures/tables of performance here and describe them.}

+The total run times for simulations with selection are shown in Figure~\ref{fig:runtimes_selection}. When tracking +neutral mutations (instead of the pedigree), run times increase dramatically with increasing region size ($4Nr = 4Nu$ for +these simulations). With \fwdpy{}, simulations with $N=5 \times 10^4$ timed out for region sizes larger than $10^3$. +With \simupop{}, we were only able to run simulations with $N=10^3$ and region sizes up to $10^4$. Pedigree tracking +dramatically improves run times for both \fwdpy{} and \simupop{} (right column of Figure~\ref{fig:runtimes_selection}). For parameters that ran to completion when tracking neutral mutations and when tracking +the pedigree, the relative speedup due to pedigree tracking is up to $\approx 50$ fold (Figure~\ref{fig:relative_speedup_selection}). For \fwdpy{}, pedigree tracking reduces runtimes more for larger population sizes +(Figure~\ref{fig:relative_speedup_selection}). We could not make such an observation with \simupop{}, which tended to +run out of memory on our system. Note the log scale of the x axis in Figure~\ref{fig:runtimes_selection}, which

Perhaps should remove the common scale for speedup...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/petrelharp/ftprime_ms/pull/38#discussion_r158555050, or mute the thread https://github.com/notifications/unsubscribe-auth/AGHnH2sbvJtMGyLQViQg7T4-0BNEDKwLks5tDBKJgaJpZM4RLTxI .

--

Kevin Thornton

Associate Professor

Ecology and Evolutionary Biology

UC Irvine

http://www.molpopgen.org

http://github.com/ThorntonLab

http://github.com/molpopgen