lme4 / lme4

Mixed-effects models in R using S4 classes and methods with RcppEigen
Other
623 stars 148 forks source link

"downdated VtV is not positive definite" triggered by badly scaled parameters #173

Open bbolker opened 10 years ago

bbolker commented 10 years ago

Reported by Derek Jones:

# Data from:
# Accurate Characteristization of the Variability in Power Consumption in Modern Mobile Processor
# Bharathan Balaji and John McCullough and Rajesh K. Gupta and Yuvraj Agarwal
library("lme4")
datfile <- system.file("testdata","hotpower.csv",package="lme4")
power_bench=read.csv(datfile, as.is=TRUE)
power_bench=subset(power_bench, !grepl("cpuload", benchmark))
power_bench$processor=as.factor(power_bench$processor)
try(lmer(meanpower ~ frequency + (1 | benchmark) + (frequency | processor),
        data=power_bench))
## Downdated VtV is not positive definite
## scaling predictor by 1e6 works around the problem ...
p_mod=lmer(meanpower ~ frequency + (1 | benchmark) + (frequency | processor),
        data=transform(power_bench,frequency=frequency/1e6))

While I hesitate to echo some previous discussions about R optimization tools ("this is a poorly scaled problem, it's not our fault that our optimization tool can't handle it"), I consider this a fairly low priority -- because there is a fairly easy workaround, and because I can't think of a general fix other than implementing some kind of parameter auto-scaling within lme4, which could be a large and complicated job to get right. (A simple, hackish approach would be simply to autoscale all but the first column of X, or all of the continuous-predictor columns of X, by their standard deviations internally ... we could center them at the same time ... but I'm hesitant to jump into these kinds of automatic fixes ...)

It may not be obvious that the problem is badly scaled because the only explicit predictor is the frequency, but in this case the variance parameters are on the order of 1, while the frequency coefficient is on the order of 1e-6.

I am much more interested in finding a way to diagnose such problems on the fly and report an informative error message ("consider rescaling parameters"), and generally to provide information and tools for trouble-shooting. We could certainly indicate somewhere that bad scaling is one possible cause of a "Downdated VtV is not positive" error ... I don't know if there are other tests we could do before we hit the VtV error to test parameter scaling. We can't just naively take the finite-difference gradient with respect to the parameters, because calculating the deviance requires the downdating process. I guess we could look at the scales of the continuous predictors, but this is far from foolproof ...

bbolker commented 10 years ago

I suppose we could implement a system like our current deficient-X-rank strategy: we could check for the range of the standard deviations of the continuous columns of X and warn OR automatically fix depending on [g]lmerControl ... (we would have to adjust any corresponding parts of Z as well ...)

Derek-Jones commented 10 years ago

Thanks to Ben Bolker for posting about the problem I experienced here.

Reading the pages returned by a Google search on "Downdated VtV is ..." sent me on a wild goose chase of looking for a problem in the wrong place. Had I known about scalability issues the problem would have been solved immediately.

I have written about this kind of user friendly issue in R: http://shape-of-code.coding-guidelines.com/2013/07/31/i-made-a-mistake-please-dont-shoot-me/

bbolker commented 10 years ago

Do you have any specific ideas about how we could address this issue? I've thought about adding a generic "troubleshooting" manual page, along the lines of the current ?pvalues page ...

Derek-Jones commented 10 years ago

A troubleshooting manual page, separated out and marked as such, is an excellent idea. 'Tucking away' material to a manual is good, those desperate enough will always read it eventually.

Lots of people simply type an error message into Google. Improving the Google rank of this page would also be good, but probably much more difficult to achieve..

simon-SMU commented 10 years ago

Hi, Thanks a lot for this valuable discussion. I have the same problem with running a (overfitted) glmer.nb regression with time fixed effects. The suggestions to change the error message are useful to laymen users like me, although I follow Ben in being hesitant about automatic fixes. Especially de-meaning and centering are not ideal solutions. For as far as my understanding is correct, those solutions would require to calculate the mean across time (e.g. when running panel data with a forward lagged y-variable) so that information that is not available at the time of the "prediction" (i.e. a response in year 2001 versus predictors in year 2000) seeps in to the regression (by demeaning, if the mean for every predictor over time is taken which seems to be the default option). Anyway, after rescaling my own variables I got the following error:

glmer.nb ( formula , data=p.data, nAGQ = 1L, start = 0.21, control=glmerControl(optimizer="Nelder_Mead"))

Error in eval(expr, envir, enclos) : the ... list does not contain 5 elements

Why do I (suddently) require 5 elements in my list?

PS the problem is not there when using nACQ = 0, but persists regardless of the optimization method chosen

bbolker commented 10 years ago

This sounds like it might (?) be a separate issue, but ... can you give a reproducible example? (if you need to anonymize you could e.g. use simulate() on the results of the nAGQ=0 fit to randomize the results, and anonymize factor levels/variable names etc. ...

simon-SMU commented 10 years ago

Hi Ben,

Thanks for the quick response. I have dummed the problem down to as little a dataset as possible (original 37 by 16 with about 30 predictors in the regression - that's what management scholars do). Regression:

g.test<-glmer.nb(y ~ x1 + x2 + x1:x2 + (1|Year), data=repex, nAGQ = 1L, start = 0.21, control=glmerControl(optimizer="bobyqa", boundary.tol=1e-2 ,check.conv.singular =.makeCC(action ="ignore",tol=1e-2),tolPwrss=1e-2))
dput(repex1)
structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("1", "2", "4", "6", 
"7", "8", "9", "10", "11", "13", "15", "16", "17", "18", "20", 
"22", "24", "26", "28", "29", "32", "34", "35", "36", "37", "39", 
"41", "42", "44", "47"), class = "factor"), Year = structure(c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 
16L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 
14L, 15L, 16L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 
12L, 13L, 14L, 15L, 16L), .Label = c("1991", "1992", "1993", 
"1994", "1995", "1996", "1997", "1998", "1999", "2000", "2001", 
"2002", "2003", "2004", "2005", "2006"), class = "factor"), y = c(5, 
10, 6, 9, 9, 4, 7, 3, 4, 3, 6, 0, 3, 1, 0, 1, 2, 2, 3, 7, 12, 
13, 9, 4, 9, 5, 7, 4, 1, 0, 0, 1, 0, 5, 5, 1, 1, 3, 3, 5, 1, 
1, 0, 0, 0, 0, 0, 0), x1 = c(0L, 0L, 3L, 3L, 3L, 5L, 3L, 3L, 
7L, 6L, 3L, 4L, 4L, 8L, 1L, 8L, 0L, 0L, 0L, 0L, 0L, 3L, 2L, 1L, 
1L, 4L, 2L, 1L, 1L, 2L, 3L, 3L, 0L, 2L, 1L, 0L, 0L, 0L, 0L, 0L, 
4L, 0L, 4L, 2L, 3L, 0L, 1L, 1L), x2 = structure(c(17.8639165094919, 
50.1020935887824, 31.1538050589932, 37.2033807072519, 24.8991492903416, 
174.175350109063, 31.4251082710696, 41.1084018081777, 10.1998547908863, 
12.067302257998, 17.0903625146349, 8.85325909190098, 6.47558100093125, 
7.01269692974855, 1.65337279464797, 0.191518074572238, 0.018595041322314, 
0.0271994246139054, 0.96361787450096, 2.5959514058137, 18.3142899841373, 
20.4702770445124, 5.42130949779285, 38.2154457985312, 16.0989967955186, 
17.8598514126896, 15.665921501896, 21.340517631197, 14.2341841365699, 
2.78195283698102, 0.298904312089158, 0.0705563395778743, 5.8217279902149, 
7.71441353992869, 49.03135409192, 12.6620210527394, 28.2588539794557, 
6.86073281363667, 5.33888953019871, 6.6234384209806, 3.40017246160656, 
8.56303764026254, 13.06239723398, 3.24123669578228, 2.2556228963796, 
1.23481355658365, 0.995714220297008, 0.149001034360244), .Dim = 48L, .Dimnames = list(
    c("1990_A", "1991_A", "1992_A", "1993_A", "1994_A", "1995_A", 
    "1996_A", "1997_A", "1998_A", "1999_A", "2000_A", "2001_A", 
    "2002_A", "2003_A", "2004_A", "2005_A", "1990_B", "1991_B", 
    "1992_B", "1993_B", "1994_B", "1995_B", "1996_B", "1997_B", 
    "1998_B", "1999_B", "2000_B", "2001_B", "2002_B", "2003_B", 
    "2004_B", "2005_B", "1990_C", "1991_C", "1992_C", "1993_C", 
    "1994_C", "1995_C", "1996_C", "1997_C", "1998_C", "1999_C", 
    "2000_C", "2001_C", "2002_C", "2003_C", "2004_C", "2005_C"
    )))), .Names = c("ID", "Year", "y", "x1", "x2"), row.names = c(1L, 
2L, 3L, 4L, 160L, 5L, 162L, 163L, 164L, 6L, 166L, 7L, 8L, 9L, 
170L, 171L, 172L, 173L, 174L, 175L, 176L, 177L, 178L, 179L, 180L, 
181L, 182L, 183L, 184L, 10L, 186L, 11L, 188L, 12L, 190L, 191L, 
192L, 13L, 194L, 14L, 196L, 197L, 15L, 199L, 200L, 16L, 202L, 
17L), class = c("plm.dim", "data.frame"))

Thanks for your interest!

stevencarlislewalker commented 10 years ago

The error occurs when glmer.nb is estimating the negative binomial theta parameter. This requires repeated refitting of the initial glmer poisson model --- in all of the various function calls associated with this task the control parameters are not being passed correctly.

By changing:

ctrl = eval(object@call$control)$checkConv,

to

ctrl = control$checkConv,

in refit.merMod, the model fits without error (but obviously still has all the convergence warnings),

> glmer.nb(y ~ x1 + x2 + x1:x2 + (1|Year),
+                  data=repex, nAGQ = 1L, start = 0.21,
+                  control=glmerControl(optimizer="bobyqa",
+                        boundary.tol=1e-2,
+                        check.conv.singular =.makeCC(action="ignore",tol=1e-2),
+                        tolPwrss=1e-2))
Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: Negative Binomial(1.8701)  ( log )
Formula: y ~ x1 + x2 + x1:x2 + (1 | Year)
   Data: ..2
      AIC       BIC    logLik  deviance  df.resid 
 236.0542  247.2814 -112.0271  224.0542        42 
Random effects:
 Groups   Name        Std.Dev.
 Year     (Intercept) 0.3471  
 Residual             0.9470  
Number of obs: 48, groups:  Year, 16
Fixed Effects:
(Intercept)           x1           x2        x1:x2  
   0.703567     0.039827     0.037991    -0.006878  
Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 3.94069 (tol = 0.001)
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?
3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model failed to converge with max|grad| = 0.124988 (tol = 0.001)
4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  :
  Model is nearly unidentifiable: very large eigenvalue
 - Rescale variables?

However, I am not sure if this is the correct approach to passing the control parameters. Will it always pass correctly in all cases?

stevencarlislewalker commented 10 years ago

Additional question: why were the control parameters being passed through the call slot in the first place? There may be a good reason in which case the solution above may not make sense.

simon-SMU commented 10 years ago

Thanks for your contribution Steven,

I also found that the problem disappears when I leave out the start value but in my complete model this becomes incredibly inefficient (especially in the full version where I run not only fixed time but also fixed firm effects...)

bbolker commented 10 years ago

It's been a while since we dusted off this bit of the code. I can't think of a reason that @stevencarlislewalker's solution isn't appropriate. Steve, do you still get the convergence warnings if you pull my recent changes to scaled gradient checks for convergence?

stevencarlislewalker commented 10 years ago

@bbolker, the maximum absolute gradient check now passes but there is still a large eigenvalue warning. I'll push my small change to refit.merMod with a comment reminding that there may have been a reason to use the call slot.

stevencarlislewalker commented 10 years ago

Working more on this now ... there are still problems

simon-SMU commented 10 years ago

Hi Steve,

Thanks again for your solution to this problem. Unfortunately, I am not entirely sure where I have to make the change you suggested:

ctrl = eval(object@call$control)$checkConv,

to ctrl = control$checkConv

I cannot simply add this to the glmer.nb function as an argument so I’m assuming I have to dig in the underlying functions? I have never done this before so any pointer would be more than welcome.

Thanks,

Kind Regards,

Simon Schillebeeckx PhD Resource-Constrained Innovation http://imperial.academia.edu/SimonSchillebeeckx Imperial College Business School +44 7780 788 610 s.schillebeeckx11@imperial.ac.uk

From: noreply@github.com [mailto:noreply@github.com] On Behalf Of Steve Walker Sent: 25 April 2014 14:35 To: lme4/lme4 Cc: Schillebeeckx, Simon Subject: Re: [lme4] "downdated VtV is not positive definite" triggered by badly scaled parameters (#173)

The error occurs when glmer.nb is estimating the negative binomial theta parameter. This requires repeated refitting of the initial glmer poisson model --- in all of the various function calls associated with this task the control parameters are not being passed correctly.

By changing:

ctrl = eval(object@call$control)$checkConv,

to

ctrl = control$checkConv,

in refit.merMod, the model fits without error (but obviously still has all the convergence warnings),

glmer.nb(y ~ x1 + x2 + x1:x2 + (1|Year),

  • data=repex, nAGQ = 1L, start = 0.21,
  • control=glmerControl(optimizer="bobyqa",
  • boundary.tol=1e-2,
  • check.conv.singular =.makeCC(action="ignore",tol=1e-2),
  • tolPwrss=1e-2))

Generalized linear mixed model fit by maximum likelihood (Laplace

Approximation) [glmerMod]

Family: Negative Binomial(1.8701) ( log )

Formula: y ~ x1 + x2 + x1:x2 + (1 | Year)

Data: ..2

  AIC       BIC    logLik  deviance  df.resid

236.0542 247.2814 -112.0271 224.0542 42

Random effects:

Groups Name Std.Dev.

Year (Intercept) 0.3471

Residual 0.9470

Number of obs: 48, groups: Year, 16

Fixed Effects:

(Intercept) x1 x2 x1:x2

0.703567 0.039827 0.037991 -0.006878

Warning messages:

1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :

Model failed to converge with max|grad| = 3.94069 (tol = 0.001)

2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :

Model is nearly unidentifiable: very large eigenvalue

3: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :

Model failed to converge with max|grad| = 0.124988 (tol = 0.001)

4: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :

Model is nearly unidentifiable: very large eigenvalue

However, I am not sure if this is the correct approach to passing the control parameters. Will it always pass correctly in all cases?

— Reply to this email directly or view it on GitHubhttps://github.com/lme4/lme4/issues/173#issuecomment-41392707.