kr-colab / discoal

discoal is a coalescent simulation program capable of simulating models with recombination, selective sweeps, and demographic changes including population splits, admixture events, and ancient samples
GNU General Public License v3.0
48 stars 10 forks source link

Problematic newicks when both -c and -en in use #12

Closed standard-aaron closed 6 years ago

standard-aaron commented 6 years ago

I'm working on simulating selection conditioned on present-day freq (-c mode) and changing demography (-en)

I get the following extremely odd errors in tree mode:

./discoal 2 100 1 -c 0.99 -wn 0 -i 1 -T | grep '\['
[1](1:1.821753,0:1.821753);
[1](1:0.110785,0:0.110785);
[1](1:0.177698,0:0.177698);
[1](0:0.335563,1:0.335563);
[1](0:0.148914,1:0.148914);
[1](0:0.632295,1:0.632295);

(i.e., these trees look fine) However, the following ought to be equivalent, but I get completely different trees:

$ ./discoal 2 100 1 -c 0.99 -wn 0 -en 0.1 0 1 -i 1 -T | grep '\['
[1](0:0.100000,1:0.100000);
[1](0:0.100000,1:0.100000);
[1](0:0.076275,1:0.076275);
[1](1:0.100000,0:0.100000);
[1](0:0.100000,1:0.100000);

I find that making a call to -en at a particular time, it causes the vast majority of trees to coalescent at that exact time. In practice, this happens when i do e.g. "-en 0.1 0 0.55" but i'm just trying to make a poitn that "-en 0.1 0 1" ought to have the same behavior as no -en call whatsoever. Am I missing something?

andrewkern commented 6 years ago

looks like a bug to me... i'll start digging.

standard-aaron commented 6 years ago

Some more sleuthing:

when I make calls of this form:

./discoal 2 <N> 1 -c <f> -wn 0 -en <t> 0 1 -i 1 -T

I notice that approximately N * (f^2 + (1-f)^2) of the trees have a TMRCA of t. So, I think there's something going on with how it treats the case where the two samples are derived/derived or ancestral/ancestral.

Additionally, I find that when I simulate trees with a larger number of samples (i.e. n > 2), this reduces the probability that there is at least 1 coalescence at exactly this time somewhere in the tree

andrewkern commented 6 years ago

okay I found a bug that I think was responsible for this. Using the -wn flag with two alleles forced some simulations into a place where they were incorrectly coalescing at the population size change. This should now be fixed. Can you pull the latest version and let me know if the results look ok? thanks as always for your careful use of discoal!!

standard-aaron commented 6 years ago

My trajjies and trees are looking nicer now. And you're the one I should thank -- discoal is awesome

standard-aaron commented 6 years ago

Motion to make 'trajjy' a thing

standard-aaron commented 6 years ago

@andrewkern : couple of followup q's:

would this bug have affected -wn simulations where there are no -en calls (i.e. constant popsize?)

is there any difference between -ws <t> -a 0 and -wn <t>?

andrewkern commented 6 years ago

unlikely that the bug should have affected simulations without -en, but the patch may have led to new issues.... codewise there is a difference between -ws <t> -a 0 and -wn. in practice they may yield the same results, although I haven't benchmarked that specifically. If you want drift trajectories I'd stick with -wn

andrewkern commented 6 years ago

@35ajstern everything working okay now? if so i'll close out this issue

standard-aaron commented 6 years ago

Yes, seems to be working okay now. Thanks Andy