kaskr / TMB_contrib_R

9 stars 6 forks source link

TMBAIC argument k and adding AICc #1

Closed mebrooks closed 7 years ago

mebrooks commented 7 years ago

These questions are kinda for @James-Thorson, but open to discussion. I'm working on adding an option for AICc to the TMBAIC function because I've been using a small sample size corrected version for my own models.

  1. I'm wondering if it might make sense to change the name of the argument k because in the Wikipedia article and in my version, k is the number of parameters. If we're going to change it, now is the time, before we mess up too much down-stream code.
  2. Can you give a reference for "the penalty on additional fixed effects" other than 2? I'm wondering if the current code might have a mistake in it because, in it, k is a weight on the objective function, rather than the number of parameters 2*length(opt[["par"]]) + k*opt[["objective"]].
  3. While we're on the subject...should we keep the default as AIC, or make it AICc since AICc converges to AIC for large enough samples? The only problem with making AICc the default is that the user must specify the sample size (n) and that's kind of a pain. I'm not sure we can get a general n length from opt. I would only make AICc the default if we could generally and reliably get n.
James-Thorson commented 7 years ago

Thanks @mebrooks! That's embarrassing. I've fixed k and pushed the change, which was used incorrectly in TMBAIC as you noted. Re: purpose, you could implement BIC if k equals the natural-log of your sample size.

Personally, I only ever use Optimize, which I think is particularly convenient for new TMB users for doing the repeated optimization from the past MLE, which in my experience sometimes tightens the convergence (decreases the final gradient). I suppose Optimize could be linked to use TMBAIC if we put enough effort in the latter for it to matter.

Re: AICc and BIC, I'm not sure how to think about them, personally, because I've never been clear in a multivariate model whether n is the number of sampled individuals, or the product of individuals and measures-per-individual. I also don't see how to extract n generically from the TMB object, although I think it'd be fine to change to:

TMBAIC=function(opt, k=2, n=Inf){
  npar = length(opt[["par"]])
  if( all(c("par","objective") %in% names(opt)) ) Return = k*npar  + 2*opt[["objective"]] + 2*npar*(npar+1)/(n-npar-1)
  if( all(c("par","value") %in% names(opt)) ) Return = k*npar + 2*opt[["value"]] + 2*npar*(npar+1)/(n-npar-1)
  return( Return )
}

where the function by default uses n=Inf such that it uses AIC by default and AICc when the user specifies a different n

mebrooks commented 7 years ago

k is used as the number of parameters in the Wikipedia article for BIC as well as the article for AIC and AICc. Burnham and Anderson use upper-case K as the number of parameters in their 2002 book. This could confuse users. Is there another key reference that uses k as the weight on the number of parameters? Maybe there's a better name for that argument?

Otherwise, I like the idea of using n=Inf and AICc as the default.

James-Thorson commented 7 years ago

OK, I just pushed changes:

  1. added n input as we discussed, with default n=Inf
  2. changing notation to use k for number of parameters, and new p for penalty on fixed-effects (to scale between AIC and BIC)

feel free to make any further changes!

mebrooks commented 7 years ago

Looks good! Sorry I didn't mean to demand that you do all this. I would have done it, but wanted to check with you first. Thanks!