matthewwolak / nadiv

R package that constructs (non)additive genetic relationship matrices, and their inverses, from a pedigree to be used in linear mixed effect models (A.K.A. the 'animal model').
17 stars 7 forks source link

LRTest & ASReml-R positive log-likelihoods #10

Closed matthewwolak closed 8 years ago

matthewwolak commented 8 years ago

Sometimes the ASReml-R package (asreml) produces positive log-likelihoods. How should these be handled particularly when doing anything based on the log-likelihood ratio tests (e.g., LRTest(), proLik(), and constrainFun())?

The assumption normally is that the log-likelihood values are negative. Then the model with the greater log-likelihood (closer to zero) has a better fit to the data.

@matthewwolak needs to construct a reproducible example and then fix the functions

matthewwolak commented 8 years ago

Reproducible example for issue #10

My colleague Vincent Careau mentioned that one way to produce an asreml model that yields positive log-likelihoods is to transform the response variable (e.g., by taking the square root or log).

Below simulates data with two random effects (additive genetic effects and generation effects) for a pedigree produced by the nadiv::simPedDFC() function. I then use asreml to analyze the data with the full model, and two sub-models where only one random effect is fitted.

Simulate data

Produce a data.frame object of an example dataset with a pedigree. Then construct a few variables necessary to simulate and analyze this dataset.

library(nadiv)
df <- simPedDFC(30)
df$gen <- as.factor(genAssign(df[, 1:3]))
ndf <- nrow(df)
A <- makeA(df[, 1:3])
listAinv <- makeAinv(df[, 1:3])$listAinv

Simulate a phenotype and transformed phenotype:

set.seed(101)
df$phen <- 100 + grfx(ndf, G = matrix(40), incidence = A) + drfx(G = matrix(20), fac = "gen", data = df)$fx + rnorm(ndf, 0, sqrt(50))
df$sqrtphen <- sqrt(df$phen)

Now analyze the data with asreml

library(asreml)
full.model <- asreml(sqrtphen ~ 1,
   random = ~ ped(id) + gen,
   ginverse = list(id = listAinv),
   data = df)
stopifnot(full.model$loglik > 0)   #<-- ensure getting the positive log-likelihood for the example

# Drop the additive genetic variance term
model.noA <- asreml(sqrtphen ~ 1,
   random = ~ gen,
   data = df)

# Drop the generation variance term
model.noG <- asreml(sqrtphen ~ 1,
   random = ~ ped(id),
   ginverse = list(id = listAinv),
   data = df)

Conclusion

Using this one realization of simulated data (and seeding the random number generator using set.seed(101)) yields a model of the square root transformed phenotype with log-likelihood full.model: 698.6. Removing the additive genetic variance component reduces the log-likelihood model.noA: 580.8 and removing the generation variance component reduces the log-likelihood model.noG: 599.3.

So it appears from this one example, that models without variance components that were simulated to definitely be >>0, have lower positive log-likelihood values.

THE ASSUMPTION

Therefore, I will assume this relationship in LRTest().

matthewwolak commented 8 years ago

Note that this was not fixed when issue #4 was closed. The commit that fixed this previous issue included a version of LRTest() that would stop if the log-likelihoods were positive. Subsequent to closing issue #4, I made a quick change to LRTest() (wrongly) assuming that I could enable the function to work with positive log-likelihoods. Now the above reproducible example shows that, at least in the above case, this fix was incorrect.