Open lczech opened 4 years ago
Update: I have re-run vcf2smc
with -c 50000
, assuming that this is a suitable number given the error message above. However, while this time the warnings about long runs of homozygosity are gone, the runtime error in the cv
command remains: cv.log, .debug.txt
This is for a dataset with 79 Arabidopsis thaliana samples, using 5 chromosomes/contigs, running with the latest smcpp version.
This error hence seems to be coming from some other problem. Any hint appreciated!
Another update: @jeffspence suggested to use estimate
instead of cv
to get at least some results (pers. comm.). However, the same error occurs there as well:
RuntimeError: erroneous average coalescence time
For details, see the attached log files of my previous message above. Any ideas what is going on or how to fix this?
Hi Lucas, sorry for the delay. Could you try setting the --timepoints
option manually and see if that makes the error go away? For humans I usually use something like --timepoints 1e2 1e5
, but you may have to experiment a bit.
The error you are experiencing is because the program is computing an expected coalescent time conditioned on coalescence inside some interval [t[i], t[i+1])
and getting back a number that is not inside that interval. This is a numerical issue this usually occurs because the model time points are selected poorly. SMC++ tries to select them automatically, but it seems to be doing a bad job here.
Hi Jonathan,
thanks for the insight! So, I tried with --timepoints 1e2 1e5
as you suggested, but this still fails with the same error message for the Arabidopsis data. Then, I tried with --timepoints 1e1 1e10
, just to see what happens, and it worked, but gave a really odd result:
Not sure what to make of this. 10^-1 generations? And only some changes before ~10^0 gens, but constant after that?
I'm trying some other settings now as well, to see if that makes more sense. Also, I have no idea what the effect of changing the timepoints is. If this is just used as an initial guess, and then changed/optimized during the runtime anyway, does it change the result a lot when picking different initial timepoints?
Update: Tried --timepoints 5e0 1e5
and --timepoints 0 1e6
, but both failed as well, with the same error message
More update: Now, I also tried with --timepoints 0 1e8
, which worked and gave the following result:
This is confusing me. The constant part after ~10^0 generations is roughly the same as in the plot above (which used different --timepoints
, but all other settings stayed the same), but the initial part before that looks vastly different. What does that mean? I am beginning to think that smcpp does not work for Arabidopsis...
Hi Lucas
Is there any progress of these question? i am in same plight because my species is highly inbred
Hi @Axolotl233,
sorry, no news from my side. I have not heard back from the developers on this, and so eventually gave up trying to use smcpp... :-( If you have any news or find a solution, please let me know! I'd be interested in picking this up again at some point.
Lucas
Hi @lczech,
Thank you for your quickly reply. I'm a newble of popluation genetics, so I’m afraid I can’t solve this problem by myself... :-(
Do you know what can be inferred from the group history of the selfing/inbred popluations?
anyway,thank you!
Hm nope, not an expert on this either, sorry! I was also just playing around with it at the time...
Greetings colleagues! I faced the same problem, is anyone find some way to make it with highly inbred species? (I work with Hordeum Spontaneum). Anyway thanks for this thread!
Greetings colleagues! I faced the same problem, is anyone find some way to make it with highly inbred species? (I work with Hordeum Spontaneum). Anyway thanks for this thread!
Have you resolve it? Dr , I also have same problem
@PotapenkoEugene @wilson1990D maybe dadi inbred model could infer populaiton demography for inbred species, i 'm not sure if it can help you. please see https://dadi.readthedocs.io/en/latest/user-guide/inbreeding/
In the
cv
command, I get the error:RuntimeError: erroneous average coalescence time
which according to this issue seems to be solvable by setting the
-c
option. However, I am struggling to find a proper value to use for-c
. The log file contains some warnings like(this is for Arabidopsis thalina data)
Does that mean that the respective run of homozygosity was 120357bp, 76500bp, or 60552bp long?
Neither he code line that produces the runtime error, nor the one that produces the warning about runs of homozygosity help me in understanding what is going on here, unfortunately.
Thanks in advance for the help Lucas