stschiff / msmc2

GNU General Public License v3.0
53 stars 9 forks source link

The abonormal values of RCCR when estimating the population splits with MSMC2 #22

Closed Yun-HongWu closed 5 years ago

Yun-HongWu commented 5 years ago

Hi Stephan, following your advice, i have used MSMC2 to estimate the population splits for my data sets. In each run four haplotypes (only one sample from each population) were used into analysis. And my commands are: msmc2 -o MSMC2_Within_1_Pop1VSPop2 -t 4 -I 0,1 --skipAmbiguous chr01.input chr02.input... msmc2 -o MSMC2_Within_2_Pop1VSPop2 -t 4 -I 2,3 --skipAmbiguous chr01.input chr02.input... msmc2 -o MSMC2_AcrossPop_Pop1VSPop2 -t 4 -I 0-2,0-3,1-2,1-3 --skipAmbiguous chr01.input chr02.input...

After the threen runs were done, I use "combineCrossCoal.py" tool to combine the three output final.txt files, and then use formula: "(2lambda_01)/(lambda_00+lambda11) " to caculate the RCCR, as your described in MSMC manual "Now the three columns titled lambda?? denote the coalescence rates within and across the subpopulations. To get relative gene flow, you can compute the relative cross-coalescence rate: 2 lambda01 / (lambda00 + lambda11)." But i got a abnormal curve like this:

image

So how could i explain the regions in which the RCCR is beyond 1 ? Did i make some mistake in my analysis? Looking forward your reply ~ Thanks in advance !

stschiff commented 5 years ago

Hi, I don’t know what’s going on here. There are many reasons this might fail… phasing is bad, genotype errors, overfitting (too many time segments). You can always simulate data (for example using msPrime (https://msprime.readthedocs.io/en/stable/ https://msprime.readthedocs.io/en/stable/) and see whether results match your expectation first, to make sure that the method basically works for you in a controlled setting.

Stephan

On 10 Apr 2019, at 05:03, Yun-HongWu notifications@github.com wrote:

Hi Stephan, following your advice, i have used MSMC2 to estimate the population splits for my data sets. In each run four haplotypes (only one sample from each population) were used into analysis. And my commands are: msmc2 -o MSMC2_Within_1_Pop1VSPop2 -t 4 -I 0,1 --skipAmbiguous chr01.input chr02.input... msmc2 -o MSMC2_Within_2_Pop1VSPop2 -t 4 -I 2,3 --skipAmbiguous chr01.input chr02.input... msmc2 -o MSMC2_AcrossPop_Pop1VSPop2 -t 4 -I 0-2,0-3,1-2,1-3 --skipAmbiguous chr01.input chr02.input...

After the threen runs were done, I use "combineCrossCoal.py" tool to combine the three output final.txt files, and then use formula: "(2lambda_01)/(lambda_00+lambda11) " to caculate the RCCR, as your described in MSMC manual "Now the three columns titled lambda?? denote the coalescence rates within and across the subpopulations. To get relative gene flow, you can compute the relative cross-coalescence rate: 2 lambda01 / (lambda00 + lambda11)." But i got a abnormal curve like this:

https://user-images.githubusercontent.com/48516488/55848555-e73cb380-5b7f-11e9-981f-5001f55fca34.png So how could i explain the regions in which the RCCR is beyond 1 ? Did i make some mistake in my analysis? Looking forward your reply ~ Thanks in advance !

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stschiff/msmc2/issues/22, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbQmh8sE78kNKPngii1O22H3uZMC_GNks5vfVSbgaJpZM4cmBFe.