petrelharp / ftprime_ms

4 stars 2 forks source link

Moving simupop to an appendix, mostly #57

Closed ashander closed 6 years ago

ashander commented 6 years ago

Ready for a look, if anyone's keen. I will take a final pass on Thursday morning to tie up loose ends

jeromekelleher commented 6 years ago

Looks fine to me... I wonder if we're overreacting a little here though. An alternative solution would be to swap figure D1 (simulations with no natural selection) into the main text. It's pretty simple and it tells a straightforward story: ancestry tracking is a lot faster. Having two implementations looks a lot better than just one.

We can then say that we also did some experiments including natural selection, and show the fwdpp results in the appendix. We don't have to show with-selection results for simupop --- if reviewers ask for them, I'm perfectly happy to say 'no', as this isn't really the point of the paper.

I know you probably won't like this idea @molpopgen as benchmarking forward simulators without selection isn't particularly meaningful. However, we're not really comparing simulators here, we're just demonstrating that ancestry tracking makes things faster, all other things being equal.

And this is what the with-selection plots show anyway, right?

I don't have a strong opinion on this though; I'm happy to go with the consensus.

petrelharp commented 6 years ago

This also makes sense to me, now that we know that the speedup is the similarly large with and without selection. They differ by a factor of 2, roughly: the speedup in the neutral sims looks to be like 2x that with selection. But we still get a peak speedup of like 50x, just with smaller simulations.

I guess a downside is that we'd lose the biggest numbers in the plot (rho = 1e5 and N=5e4). But we could highlight those results in the text: say "This allows us to run simulations of N=5e4 individuals with rho = 1e5 and (selection) in 48 hours, which we conservatively estimate would take years without ancestry tracking."

Uh, dammit, I think all these are OK options. @molpopgen ?

ashander commented 6 years ago

Hmm. I agree with this and it would be straightforward

wonder if we're overreacting a little here though. An alternative solution would be to swap figure D1 (simulations with no natural selection) into the main text. It's pretty simple and it tells a straightforward story: ancestry tracking is a lot faster. Having two implementations looks a lot better than just one.

That said, in the non-selected case the simulators aren't really doing anything. And the way we build up the rationale for why our approach should be good -- by saying "let's only simulate the selected parts" in the intro -- makes less sense. Not to mention the issue peter highlighted D1 looks kind of thin as a result versus the current Fig 1 with larger rho/N values.

I guess this comes down to weighing:

Having two implementations

vs

Having a benchmark that's well-motivated by the intro and looks substantial.

edited to add: I suppose the intro could be rephrased a bit. At the end of the day what @jeromekelleher said about the results of D1/D2 is true --- they make the straightforward point that ancestry tracking is way faster

molpopgen commented 6 years ago

If we're just going to do neutral stuff, then there was no need to use any external simulation engine at all. It would have sufficed to have an ~20 line C program generating random numbers.

The interesting exercise, to me at least, is how to integrate it, which means all the bells and whistles. That's not fully a solved problem yet, but we've come a fair way.

On Thu, Jan 11, 2018 at 9:58 AM Jaime Ashander notifications@github.com wrote:

Hmm. I agree with this and it would be straightforward

wonder if we're overreacting a little here though. An alternative solution would be to swap figure D1 (simulations with no natural selection) into the main text. It's pretty simple and it tells a straightforward story: ancestry tracking is a lot faster. Having two implementations looks a lot better than just one.

That said, in the non-selected case the simulators aren't really doing anything. And the way we build up the rationale for why our approach should be good -- by saying "let's only simulate the selected parts" in the intro -- makes less sense. Not to mention the issue peter highlighted D1 looks kind of thin as a result versus the current Fig 1 with larger rho/N values.

I guess this comes down to weighing:

Having two implementations

vs

Having a benchmark that's well-motivated by the intro and looks substantial.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/petrelharp/ftprime_ms/pull/57#issuecomment-357008875, or mute the thread https://github.com/notifications/unsubscribe-auth/AGHnHxOykWWq0TCmUbljKQsOR9yGt5kNks5tJktZgaJpZM4RakSc .

--

Kevin Thornton

Associate Professor

Ecology and Evolutionary Biology

UC Irvine

http://www.molpopgen.org

http://github.com/ThorntonLab

http://github.com/molpopgen

jeromekelleher commented 6 years ago

If we're just going to do neutral stuff, then there was no need to use any external simulation engine at all. It would have sufficed to have an ~20 line C program generating random numbers.

Fair enough. It's not a strong opinion, I thought I'd just throw it out there!

ashander commented 6 years ago

Fair enough. It's not a strong opinion, I thought I'd just throw it out there!

Thanks for doing so. It's good to make sure. It'd obviously be ideal to have two implementations up top and well-motivated and interesting benchmarks for both.

ashander commented 6 years ago

OK. I think this is good to merge and accomplishes the goal. The last two commits re-position the section on the Tables API implementation to just after the 'working with tables' and move the 'next steps' to discussion.

I'll let @petrelharp merge and/or change this PR --- but I think the final consensus is to move simupop stuff to an appendix

petrelharp commented 6 years ago

looks good. will merge then edit.