Duplicating 2011 G-phocs paper results

kmanoharan01 commented 3 years ago

Hi

I have used the following control/model parameter as suggested in the paper supplementary Information, except the number of iteration. However, I am not getting the results given in the supplementary Information, would the change in iteration affect the result majorly? Or I am missing something in the parameter? Could you please help?

######### GENERAL-INFO-START

seq-file          neutralLoci-7genomes.txt
trace-file          mcmc_all_neutralLoci.log                
locus-mut-rate          VAR 1.0

mcmc-iterations   30000
iterations-per-log  10000
logs-per-line       10

find-finetunes      TRUE
finetune-coal-time  0.3     
finetune-mig-time   0.3     
finetune-theta      0.04
finetune-mig-rate   0.02
finetune-tau        0.0000008
finetune-mixing     0.003

tau-theta-print     10000.0 
tau-theta-alpha     1.0         # for STD/mean ratio of 100%
tau-theta-beta      10000.0     # for mean of 1e-4

mig-rate-alpha      0.002
mig-rate-beta       0.00001

GENERAL-INFO-END

CURRENT-POPS-START

POP-START
    name        A
    samples     hanChinese d
POP-END

POP-START
    name        B
    samples     venter d
POP-END

POP-START
    name        C
    samples     na18507 d
POP-END

POP-START
    name        D
    samples     chimp d
POP-END

CURRENT-POPS-END

ANCESTRAL-POPS-START

POP-START
    name            AB
    children        A       B
    tau-alpha       1
    tau-beta        30000.0 
    finetune-tau            0.0000008
POP-END

POP-START
    name            ABC
    children        AB      C
    tau-alpha       1
    tau-beta        25000.0 
    finetune-tau            0.0000008
POP-END

POP-START
    name            root
    children        ABC D
    tau-alpha   1
    tau-beta        1000.0  
    finetune-tau            0.0000008
POP-END

ANCESTRAL-POPS-END ###############

igronau commented 3 years ago

This actually seems quite close to what we inferred in our analysis, if I recall correctly (I didn't directly compare to our supplementary figures). The differences you're seeing are likely due to two main factors: (1) Your model does not have any migration bands. This will slightly reduce the estimates of tau. (2) You're not analyzing all the samples and populations we did. Most analyses we report had 5-6 human populations if I recall correctly. Theoretically, this shouldn't affect the estimates, but in practice it can have a small effect. (3) You are using a very small number of iterations, so possibly the MCMC has not converged yet. You can examine the traces of some of the parameters to see if they converge around some value or are trending upward/downward.

kmanoharan01 commented 3 years ago

Thanks, very much for your reply.

I will try with the above options.

Could you please let me know if my understanding is correct to covert mean estimate value (tau) to real numbers?

You have suggested in the supplementary information following formula (tau mean estimate value/mutation rate)* 10^4;

Here I have to use the mutation rate is (7.1 *10^-10), correct?

igronau commented 3 years ago

I don't remember now the specific value we used for mutation rate, but the formula is correct. make sure you use the per-year mutation rate and not the per-generation one.

kmanoharan01 commented 3 years ago

Thank you very much..

gphocs-dev / G-PhoCS

Duplicating 2011 G-phocs paper results #83