Mugration p-value - Githubissues

Description

I'm tackling the issue of sampling bias in mugration, and was curious if a p-value might be of use here? If I knew the probability of an event happening by chance (given the data) it might guide interpretations.

Disclaimer: I am not a statistician, so if I'm way off, or this is already described, please let me know!

Theory

Given n states s₁, s₂,... s_n with frequencies f₁, f₂,...f_n, what is the probability of observing a transition of s_j to s_k by chance?

Working Example

What is the probability of observing a mugration event between Russia and Germany by chance? In this example, this probability/p-value is 0.14 and it's up to the user to decide whether that is too high.

import itertools

states = ["Russia", "Lithuania", "Estonia", "Germany"]
frequencies = [4,1,1,2]

observations = []
for s,f in zip(states, frequencies):
    observations += [s] * f
# ['Russia', 'Russia', 'Russia', 'Russia', 'Lithuania', 'Estonia', 'Germany', 'Germany']

transitions = list(itertools.permutations(observations, 2))
transitions_uniq = set(transitions)
# I'm uncertain if "staying in place" should be considered a transition?

target = ("Russia", "Germany")
pvalue = transitions.count(target) / len(transitions)

# Results in a p-value of 0.14

neherlab / treetime

Mugration p-value #153

Description

Theory

Working Example