neherlab / treetime

Maximum likelihood inference of time stamped phylogenies and ancestral reconstruction
MIT License
223 stars 55 forks source link

Mugration p-value #153

Open ktmeaton opened 3 years ago

ktmeaton commented 3 years ago

Description

I'm tackling the issue of sampling bias in mugration, and was curious if a p-value might be of use here? If I knew the probability of an event happening by chance (given the data) it might guide interpretations.

Disclaimer: I am not a statistician, so if I'm way off, or this is already described, please let me know!

Theory

Given n states s1, s2,... sn with frequencies f1, f2,...fn, what is the probability of observing a transition of sj to sk by chance?

Working Example

What is the probability of observing a mugration event between Russia and Germany by chance? In this example, this probability/p-value is 0.14 and it's up to the user to decide whether that is too high.

import itertools

states = ["Russia", "Lithuania", "Estonia", "Germany"]
frequencies = [4,1,1,2]

observations = []
for s,f in zip(states, frequencies):
    observations += [s] * f
# ['Russia', 'Russia', 'Russia', 'Russia', 'Lithuania', 'Estonia', 'Germany', 'Germany']

transitions = list(itertools.permutations(observations, 2))
transitions_uniq = set(transitions)
# I'm uncertain if "staying in place" should be considered a transition?

target = ("Russia", "Germany")
pvalue = transitions.count(target) / len(transitions)

# Results in a p-value of 0.14
rneher commented 3 years ago

I guess one thing that one could test is whether particular transitions happen more frequently than expected in a flat transition matrix. But the probabilistic interpretation of mugration models are subtle and first and foremost depend on sampling and the assumption of reversibility.