popgenmethods / smcpp

SMC++ infers population history from whole-genome sequence data.
GNU General Public License v3.0
153 stars 35 forks source link

RuntimeError: erroneous average coalescence time #165

Open lczech opened 4 years ago

lczech commented 4 years ago

In the cv command, I get the error:

RuntimeError: erroneous average coalescence time

which according to this issue seems to be solvable by setting the -c option. However, I am struggling to find a proper value to use for -c. The log file contains some warnings like

120357 smcpp.estimation_tools WARNING Long runs of homozygosity 
in contig out/smc/asia.1.10015.smc.gz: 
[76500 60552] (base pairs)

(this is for Arabidopsis thalina data)

Does that mean that the respective run of homozygosity was 120357bp, 76500bp, or 60552bp long?

Neither he code line that produces the runtime error, nor the one that produces the warning about runs of homozygosity help me in understanding what is going on here, unfortunately.

Thanks in advance for the help Lucas

lczech commented 4 years ago

Update: I have re-run vcf2smc with -c 50000, assuming that this is a suitable number given the error message above. However, while this time the warnings about long runs of homozygosity are gone, the runtime error in the cv command remains: cv.log, .debug.txt

This is for a dataset with 79 Arabidopsis thaliana samples, using 5 chromosomes/contigs, running with the latest smcpp version.

This error hence seems to be coming from some other problem. Any hint appreciated!

lczech commented 4 years ago

Another update: @jeffspence suggested to use estimate instead of cv to get at least some results (pers. comm.). However, the same error occurs there as well:

RuntimeError: erroneous average coalescence time

For details, see the attached log files of my previous message above. Any ideas what is going on or how to fix this?

terhorst commented 4 years ago

Hi Lucas, sorry for the delay. Could you try setting the --timepoints option manually and see if that makes the error go away? For humans I usually use something like --timepoints 1e2 1e5, but you may have to experiment a bit.

The error you are experiencing is because the program is computing an expected coalescent time conditioned on coalescence inside some interval [t[i], t[i+1]) and getting back a number that is not inside that interval. This is a numerical issue this usually occurs because the model time points are selected poorly. SMC++ tries to select them automatically, but it seems to be doing a bad job here.

lczech commented 4 years ago

Hi Jonathan,

thanks for the insight! So, I tried with --timepoints 1e2 1e5 as you suggested, but this still fails with the same error message for the Arabidopsis data. Then, I tried with --timepoints 1e1 1e10, just to see what happens, and it worked, but gave a really odd result:

plot

Not sure what to make of this. 10^-1 generations? And only some changes before ~10^0 gens, but constant after that?

I'm trying some other settings now as well, to see if that makes more sense. Also, I have no idea what the effect of changing the timepoints is. If this is just used as an initial guess, and then changed/optimized during the runtime anyway, does it change the result a lot when picking different initial timepoints?

Update: Tried --timepoints 5e0 1e5 and --timepoints 0 1e6, but both failed as well, with the same error message

More update: Now, I also tried with --timepoints 0 1e8, which worked and gave the following result:

plot

This is confusing me. The constant part after ~10^0 generations is roughly the same as in the plot above (which used different --timepoints, but all other settings stayed the same), but the initial part before that looks vastly different. What does that mean? I am beginning to think that smcpp does not work for Arabidopsis...

Axolotl233 commented 3 years ago

Hi Lucas

Is there any progress of these question? i am in same plight because my species is highly inbred

lczech commented 3 years ago

Hi @Axolotl233,

sorry, no news from my side. I have not heard back from the developers on this, and so eventually gave up trying to use smcpp... :-( If you have any news or find a solution, please let me know! I'd be interested in picking this up again at some point.

Lucas

Axolotl233 commented 3 years ago

Hi @lczech,

Thank you for your quickly reply. I'm a newble of popluation genetics, so I’m afraid I can’t solve this problem by myself... :-(

Do you know what can be inferred from the group history of the selfing/inbred popluations?

anyway,thank you!

lczech commented 3 years ago

Hm nope, not an expert on this either, sorry! I was also just playing around with it at the time...

PotapenkoEugene commented 2 years ago

Greetings colleagues! I faced the same problem, is anyone find some way to make it with highly inbred species? (I work with Hordeum Spontaneum). Anyway thanks for this thread!

wilson1990D commented 2 years ago

Greetings colleagues! I faced the same problem, is anyone find some way to make it with highly inbred species? (I work with Hordeum Spontaneum). Anyway thanks for this thread!

Have you resolve it? Dr , I also have same problem

Axolotl233 commented 2 years ago

@PotapenkoEugene @wilson1990D maybe dadi inbred model could infer populaiton demography for inbred species, i 'm not sure if it can help you. please see https://dadi.readthedocs.io/en/latest/user-guide/inbreeding/