stschiff / msmc2

GNU General Public License v3.0
53 stars 9 forks source link

Odd relative cross-coalescence rate for simulated data #24

Closed marqueda closed 4 years ago

marqueda commented 4 years ago

Dear Stephan,

I have run msmc2 v2.1.1 on simulated data generated by msprime v0.7.4 and ran into weird results for the cross-coalescence rate: the cross-coalescence of two populations of equal and constant population size (Ne=50,000) that split 1 million generations ago reaches only a maximum value of 0.125 at the split time instead of 1 as it should. I don't understand what the issue is and wonder whether this is due to a bug [edit: or may I be hitting the inference limits under the parameters I am simulating?]. HYOsim Ne crosscoal The demographic model I simulated is a bit more complicated, as I also simulate a third, hybrid population (Ne=100,000) between the two populations above that is formed 75,000 generations ago. But technically, this should not influence the Ne history or cross-coalescence for populations 1 and 2. Also, I am using a realistic recombination map in the simulations to simulate 22 chromosomes with variable recombination each. Please let me know if I should send you the msprime code I used to simulate the data.

Here are the input I used and output files from msmc2: https://www.dropbox.com/s/knxhqvclafgetyo/msmcinout.tgz?dl=0, and below is the msmc2 commands I used:

i=HYOsim.r30
comp=0-1; input=${i}; output1=${i}.i100.r10.N
msmc2 -t 22 -i 100 -r 10 -I ${comp} -o ${output1} $(ls ${input}.*.msmcin)
comp=2-3; input=${i}; output2=${i}.i100.r10.C
msmc2 -t 22 -i 100 -r 10 -I ${comp} -o ${output2} $(ls ${input}.*.msmcin)
comp=0-2,0-3,1-2,1-3; input=${i}; output4=${i}.i100.r10.NvC
msmc2 -t 22 -i 100 -r 10 -I ${comp} -o ${output4} $(ls ${input}.*.msmcin)
output7=${i}.i100.r10.NvC.crosscoal
combineCrossCoal.py ${output4}.final.txt ${output1}.final.txt ${output2}.final.txt > ${output7}.final.txt

Thank you for sharing your thoughts on this issue!

Best wishes, David

stschiff commented 4 years ago

I'd have to look more deeply into this, but as a first comment, under human-like parameters a split 1 million years ago is likely too deep to properly estimated by MSMC2. @wangke16, do you have an opinion on this?

wangke16 commented 4 years ago

Yeah, 1 mya might be too deep. The maximum split time i have simulated for human is 300kya. Also, rCCR doesn't reach 1 at split time. t(rCCR=0.5) is taken as the estimate for split time usually.

marqueda commented 4 years ago

Dear Stephan and @wangke16, thank you for your super quick replies, makes total sense. I will rerun these analyses with some other parameter combinations which will then probably lead to complete rCCRs.