Closed wsdewitt closed 4 years ago
In other words, you're testing our inference on a forward model that's more sophisticated? This will tell us whether the framework we are using is robust to model mis-specification.
Yep, those sims generate coalescent trees, then mutations on them, according to realistic demographies with complications like linkage and population structure.
A first place to look may be the starting kit for ∂a∂I (Gutenkunst et al.), which includes some simulated eta(t) and SFS.
When we start trying to invert mutation spectrum evolution (see #12) we'll want to look at:
Before jumping into this, I'd be sure to test the inference method where the model is correct more thoroughly, to be sure everything is working.
Adding a few items on this issue from todays mushi chat:
When we simulate with msprime (linkage disequilibrium misspecification) we see funky outlier points in the SFS. These are due to deep branches in the coalescent that don’t get rearranged, so we effectively sample only one tree giving the same frequency for all mutations on that branch. If we sampled more trees, the outliers would smooth out into the neighboring frequency categories. One way to deal with this is to coarse-grain bin the SFS at higher frequencies (as done in fastneutrino)
I think binning is pretty straightforward in terms of the PRF log likelihood, since the expectation of a bin will be the sum of expectation of the elements in the bin
PR #22 includes an implementation of frequency binning as sketched above. Here's an example output plot showing the bins as vertical lines.
Implemented using msprime and stdpopsim in test-msprime.ipynb
Popsim