Closed evo-eco-gen closed 3 years ago
I think this can happen due to overfitting, and there isn't a really good way to control for that, other than - as you already did - to reduce the number of parameters by gluing together time segments.
Thanks! Could this be related to imperfections of statistical phasing? The dataset is 200 individuals only, we used phase-informative reads and genetic maps, but still not nearly as good as current human data.
Also, is there a clear reference on how to choose the SMC parameters? I really struggle to understand why the numbers are what they are, both in the defaults and in various papers using SMC (people tend o change them without much explanation).
Sure, phasing can also play a role. Hard to test, though. You could run simulations and artificially introduce phasing errors to see how bad the situation can get. What do you mean by "SMC parameters"?
Yes, a simulation will be useful. I meant -p, so how to choose the number of time intervals and how they are bound together.
It's mostly experienced based. I have run simulations and tried different combinations to get some idea about this. It's hard to come up with a fully objective way to optimise this. I think my approach is similar to Li and Durbin's approach in their PSMC paper.
Hi, I had a related question. Do you think that overfitting driving extremely high Ne only at the most ancient time segments invalidates the trajectory of Ne at more recent periods, or is the whole plot likely to be unreliable? Thank you.
Impossible to say in general, I am afraid. Try to reduce the number of parameters by gluing together time segments in both ends, to see whether the increase gets mitigated, suggesting overfitting.
Hello,
I have a weird problem with MSMC2. I use it to study demography of a non-model species, with populations diverged over the last 100,000-1M years (the times are deeper than for the usual human work).
What could cause these behaviours? I don't care about the very recent or the very deep past, but why is this happening?... I would like to think the input data, masks etc are of good quality...
Cheers, evoecogen