Closed oushujun closed 5 years ago
rather than start by training a classifier you should start with examining your simulations-- does the scenario / input parameters yield a significant effect on patterns of diversity that you can observe? For instance what is the average loss in heterozygosity associated with sweeps as you have specified them?
That is a good point! I check the hard sweep simulation and found all 11 windows has pi dropped down to 0.1, which is only the case for the selected window in the mosquito example. I used the concept of "effective mutation/recombination rate"=(u or rho)/(1+F), where F is inbreeding coefficient, to resemble the case of selfing, so presumably it has almost 0 heterozygosity. I will try more recent time range and see if the simulations signals are distinguishable. Thanks for the insight!
Shujun
I followed the mosquito guide, simulations were done using population parameters of my data.
Below is an example of my hard sweep simulation (soft sweep and neutural were similar):
discoal 200 2000 55000 -Pt 89.43 894.3 -Pre 556.6 1669.8 -Pa 2000 200000 -Pu 0.0000 0.025 -ws 0 -en 0.0001 0 0.5 -en 0.0016 0 0.2 -en 0.0032 0 0.01 -en 0.006 0 0.03 -en 0.04 0 10 -x 0.22727272727272727 > hard_2_msOut
However the training accuracy was very low:
Then I thought maybe the severe bottleneck is messing up the selection signature in such a short sequence (5Kb), so I scaled up the length to 30K (which eats up to 1200 GB of memory by the way) Sample command for hard sweep:
discoal 200 2000 330000 -Pt 536.58 5365.8 -Pre 3339.6 10018.8 -Pa 2000 200000 -Pu 0.0000 0.025 -ws 0 -en 0.0001 0 0.5 -en 0.0016 0 0.2 -en 0.0032 0 0.01 -en 0.006 0 0.03 -en 0.04 0 10 -x 0.5 > hard_5_msOut2
But the training accuracy does not improve much:
Since the time range I am simulating is very large (
-Pu 0.0000 0.025
, up to 10K generations), would it be that the simulation replicates (2000 replications) are not enough to sufficiently cover this range during the training? Now I am trying to simulate 20000 and 50000 replications for two independent trials and have not get the result yet. Could you provide some suggestions or insights?Thanks, Shujun