Cryptic description of p_missing.

Hjorvik commented 4 years ago

Hey John.

I've been reading the updated documentation, but I have been struggling to understand the description of the p_missing function. I understand what it does, but I still don't know how to interpret the values.

Thanks.

jackkamm commented 4 years ago

It is an unbiased estimate for the probability that an allele is missing, if we sample a random individual at a random locus.

It is mainly used internally, to rescale the expected nucleotide diversity, which can be used in the composite likelihood (when use_pairwise_diffs=True in https://momi2.readthedocs.io/en/latest/api.html#momi.DemographicModel.set_data), or for mutation rate estimation (see section A.4.3 of the paper).

Note the naive estimator for missingness is biased, because some fraction of sites that should be SNPs, won't show up in the SFS due to the missingness (usually, these would be singleton or rare mutations). So this method does something a little more complicated, which is unbiased.

Hjorvik commented 4 years ago

But is it the probability of missingness within population or between them?

jackkamm commented 4 years ago

It returns a vector whose length is the number of populations. The i-th entry is the probability of missingness for an allele sampled from the i-th population.

popgenmethods / momi2

Cryptic description of p_missing. #35