harrispopgen / mushi

[mu]tation [s]pectrum [h]istory [i]nference
https://harrispopgen.github.io/mushi/
MIT License
24 stars 5 forks source link

Use "real" coalescent simulation, instead of just the forward model #11

Closed wsdewitt closed 4 years ago

wsdewitt commented 5 years ago

Popsim

kamdh commented 5 years ago

In other words, you're testing our inference on a forward model that's more sophisticated? This will tell us whether the framework we are using is robust to model mis-specification.

wsdewitt commented 5 years ago

Yep, those sims generate coalescent trees, then mutations on them, according to realistic demographies with complications like linkage and population structure.

wsdewitt commented 5 years ago

A first place to look may be the starting kit for ∂a∂I (Gutenkunst et al.), which includes some simulated eta(t) and SFS.

When we start trying to invert mutation spectrum evolution (see #12) we'll want to look at:

kamdh commented 5 years ago

Before jumping into this, I'd be sure to test the inference method where the model is correct more thoroughly, to be sure everything is working.

wsdewitt commented 5 years ago

Adding a few items on this issue from todays mushi chat:

wsdewitt commented 5 years ago

When we simulate with msprime (linkage disequilibrium misspecification) we see funky outlier points in the SFS. These are due to deep branches in the coalescent that don’t get rearranged, so we effectively sample only one tree giving the same frequency for all mutations on that branch. If we sampled more trees, the outliers would smooth out into the neighboring frequency categories. image One way to deal with this is to coarse-grain bin the SFS at higher frequencies (as done in fastneutrino)

wsdewitt commented 5 years ago

I think binning is pretty straightforward in terms of the PRF log likelihood, since the expectation of a bin will be the sum of expectation of the elements in the bin Scanned Document 2

wsdewitt commented 5 years ago

PR #22 includes an implementation of frequency binning as sketched above. Here's an example output plot showing the bins as vertical lines. image

wsdewitt commented 4 years ago

Implemented using msprime and stdpopsim in test-msprime.ipynb