How to determine whether the score is reasonable in qpgraph

The scores are approximate log-likelihoods (with the minus sign omitted). They are difficult to interpret on their own and are more useful when comparing models. In theory, an absolute log-likelihood score difference of one suggests that the probability of the data under one model is about 2.7 times as high as under the other model. However, the likelihood (probability of the data given the model) is very different from the posterior probability (probability of the model given the data).

Another way of looking at the score is to consider that it is calculated as a weighted sum of f-statistic z-scores residuals (difference in z-scores of fitted minus observed f-statistics). A score of 694 suggests that one or more f-statistic z-scores are much greater than 4, so there is strong evidence that at least one observed f-statistic is not compatible with this model. But that is assuming that the goal is to find a model which fits the data perfectly. The problem with that is that in the limit of infinite data, any minor deviations between the real history and the model will lead to large f-statistic z-score residuals and large absolute scores. At the same time, noisy data and large models with too many degrees of freedom will result in "better" scores, closer to zero, even if these models have nothing to do with the actual history.

So I would focus more on the difference in scores of competing models, not on the absolute score of a single model.

uqrmaie1 / admixtools

How to determine whether the score is reasonable in qpgraph #40