Updating the vector of random effects in the PIRLS iterations

dmbates commented 11 years ago

I have isolated the problem in the function that determines the conditional mode of the random effects using PIRLS but my attempts to solve it have not been successful. If you check out the db/radicalrewrite branch and look at the pirls.R file, the function pirls generates a function, which I usually call devf, to evaluate negative twice the log-likelihood for values of theta and beta in a GLMM using the Laplace approximation.

Several vectors in the environment of the function; eta, mu and u, are updated during the evaluation of the conditional mode of u given y. This works fine for eta and mu but it appears that u is not being updated. Using the example from tests/Contraception.R under the debugger, I print the value of u before and after the assignment

u[] <- u + delu

and nothing has changed.

debug at /home/bates/gitR/lme4pureR/R/pirls.R#110: u[] <<- u + delu
Browse[2]> str(delu)
 num [1:60] -0.8594 -0.0621 0.6795 0.19 0.1203 ...
Browse[2]> str(u)
 num [1:60] 0 0 0 0 0 0 0 0 0 0 ...
Browse[2]> 
debug at /home/bates/gitR/lme4pureR/R/pirls.R#77: i
Browse[2]> str(u)
 num [1:60] 0 0 0 0 0 0 0 0 0 0 ...

Suggestions would be welcomed.

dmbates commented 11 years ago

Well I was making the usual dumb mistake of not using the evil <<- assignment operator when I wanted to change the value in the enclosing environment of the objective function. I just pushed some changes to pirls.R in the db/radicalrewrite branch that result in convergence of the PIRLS algorithm. I found that there was a term in the Laplace approximation that was not present in the compiled code in the lme4 repository and commented that out.

If I run the code in tests/Contraception.R with this branch (remember that you must have the flexLambda branch of lme4 installed so that thfun is defined) I get

> beta1 <- c(-1.00637368953914, 0.00625611018864605, -0.00463527887823825, 0.860382098849009, 0.692921941313297)
> theta1 <- 0.473985228740739
> dev1 <- devf(c(theta1,beta1))
inc:        -1.48     -0.04505       0.4535        0.354       0.2555
1.0000:   2322.448
inc:      -0.0865   -5.295e-05   -8.386e-05    -0.001325    -0.002765
1.0000:   2322.326
inc:   -0.0005413   -7.478e-11    5.567e-12   -7.891e-09   -2.971e-07
1.0000:   2322.326
  2373.181:        0.474       -1.006     0.006256    -0.004635       0.8604       0.6929
> dput(dev1)
2373.18110777587

which is a bit lower than the compiled code gives

# deviance(gm1)  # 2373.18581907321

The values of u and mu are similar to those from the compiled code

> dput(head(environment(devf)$u))
c(-1.5660076464618, -0.0451065791782602, 0.453409838770107, 0.352655219404308, 
0.252729808923678, -0.504468072686005)
> dput(head(unname(environment(devf)$mu)))
c(0.160248723309806, 0.225474384185711, 0.451108316626996, 0.383911927029981, 
0.119942040834776, 0.148334692215887)

(see the tests/Contraception.R file for the comparison values).

However, I can't optimize this objective with bobyqa or nlminb or optim using Nelder-Mead. It may be the scaling of this particular example in that a relatively small step in some of the coefficients (age and I(age^2)) puts the algorithm in an area where the fit is so bad that PIRLS fails to converge.

> optim(c(1,beta0), devf, method="Nelder-Mead")
inc:      -0.8594     -0.06208       0.6795         0.19       0.1203
1.0000:   2286.898
inc:     -0.06905   -0.0003522     0.006812   -0.0007996    -0.001719
1.0000:   2285.883
inc:   -0.0009174   -1.209e-08     1.86e-06   -8.208e-10   -3.195e-07
1.0000:   2285.879
inc:   -1.656e-07   -1.299e-16    1.395e-13    1.878e-16   -1.102e-14
1.0000:   2285.879
  2397.168:            1      -0.9429     0.004967    -0.004328       0.8062       0.7664
inc:      -0.7862     -0.05833       0.7008       0.1767       0.1115
1.0000:   2283.741
inc:      -0.0644   -0.0003542      0.01025   -0.0007693    -0.001653
1.0000:   2282.513
inc:   -0.0008899   -1.395e-08    5.855e-06   -3.794e-10   -3.302e-07
1.0000:   2282.505
inc:   -1.738e-07   -2.434e-16    1.926e-12     5.99e-18   -1.321e-14
1.0000:   2282.505
  2402.922:          1.1      -0.9429     0.004967    -0.004328       0.8062       0.7664
inc:      -0.9524      -0.1419       0.6483       0.1021      0.02945
1.0000:   2288.460
inc:     -0.07154    -0.001609     0.008522   -0.0001095   -9.474e-05
1.0000:   2287.352
inc:   -0.0009799    -2.44e-07    3.151e-06    8.989e-12   -9.558e-10
1.0000:   2287.346
inc:   -1.883e-07   -5.496e-15    4.333e-13    1.577e-16     1.38e-16
1.0000:   2287.346
  2398.845:            1      -0.8429     0.004967    -0.004328       0.8062       0.7664
inc:       -1.043     -0.04989        0.889      0.07865      0.02628
1.0000:   2554.707
inc:     -0.05612   -0.0001482     -0.01003    6.557e-05   -3.817e-05
1.0000:   2553.947
inc:   -0.0005195   -1.406e-09    1.247e-06    7.723e-11   -7.738e-11
1.0000:   2553.945
  2659.747:            1      -0.9429        0.105    -0.004328       0.8062       0.7664
inc:       -6.857        -2.91       0.1249       -2.321       -5.131
1.0000:  16902.474
inc:      -0.3297      -0.1771    0.0006047       0.1048      -0.8304
1.0000:  16875.907
inc:     -0.01061     -0.00125    1.365e-08    -0.001268     -0.02504
1.0000:  16876.044
0.5000:  16875.940
0.2500:  16875.916
0.1250:  16875.909
0.0625:  16875.910
0.0312:  16875.907
0.0156:  16875.906
inc:      -0.0104     -0.00123    1.344e-08    -0.001248     -0.02464
1.0000:  16876.043
0.5000:  16875.938
0.2500:  16875.914
0.1250:  16875.910
0.0625:  16875.909
0.0312:  16875.905
inc:     -0.01006    -0.001192    1.302e-08    -0.001208     -0.02387
1.0000:  16876.043
0.5000:  16875.942
0.2500:  16875.916
0.1250:  16875.908
0.0625:  16875.907
0.0312:  16875.908
0.0156:  16875.910
0.0078:  16875.908
0.0039:  16875.908
0.0020:  16875.907
0.0010:  16875.907
Error in fn(par, ...) (from pirls.R#107) : Step-halving failed

@stevencarlislewalker, @bbolker Can you refresh my memory on what you did to initialize the PIRLS algorithm? Here I am taking the naive approach of always restarting at u[] <<- numeric(q) and using the corresponding eta and mu but that doesn't seem to be a good choice here.

Another approach could be to scale the beta parameters by the standard errors of the coefficient estimates for the glm without any random effects

> coef(summary(gm1))
                Estimate   Std. Error    z value     Pr(>|z|)
(Intercept) -0.942935490 0.1497300237 -6.2975712 3.023454e-10
age          0.004967476 0.0075889285  0.6545688 5.127454e-01
I(age^2)    -0.004327508 0.0006918076 -6.2553638 3.965904e-10
ch1          0.806217238 0.1422618723  5.6671350 1.452049e-08
urbanY       0.766361560 0.1059462984  7.2334907 4.707348e-13

You can see that the fitted values are going to be especially sensitive to the second and third coefficients.

dmbates commented 11 years ago

The tests/cbpp.R example does converge using the version of pirls.R in the db/radicalrewrite branch but not with the nloptwrap version of bobyqa. Both optim and nlminb converge to the same value of the parameters reporting a slightly lower value of the objective than does the Nelder-Mead version in the lme4 compiled code.

stevencarlislewalker commented 11 years ago

I can't completely reproduce this. bobyqa fails for me in the same way you describe, which is strange, but the value of the objective function returned by optim is very similar to that returned by the compiled code. I can get optim to report pretty much the same value for the objective (184.0526) as that for the compiled code (184.0531), by just running tests/cbpp.R and passing devf to optim (without a lower bound). Although you are technically correct that 184.0526 < 184.0531, this difference seems ignorable to me and provides pretty strong evidence that pirls.R is working again, which is great news! I'll look more into the bobyqa problem now.

stevencarlislewalker commented 11 years ago

The problem with bobyqa seems to be that theta and beta are not getting updated:

> beta0
[1] -1.269023 -1.170763 -1.301405 -1.782279

> bobyqa(c(1,beta0), devf, lower=c(0,rep.int(-Inf,length(beta0))))[c("par","objective")]
inc:       0.8051      -0.3693       0.4893      0.02168      -0.2395
1.0000:     79.065
inc:      -0.1281     -0.04824     -0.05824   -0.0001259     -0.02013
1.0000:     77.167
inc:    -0.003484    -0.000754   -0.0009127    -4.27e-09   -0.0001327
1.0000:     77.164
inc:   -2.614e-06   -1.812e-07    -2.23e-07   -1.594e-17   -5.725e-09
1.0000:     77.164
   187.597:            1       -1.269       -1.171       -1.301       -1.782
inc:       0.8051      -0.3693       0.4893      0.02168      -0.2395
1.0000:     79.065
inc:      -0.1281     -0.04824     -0.05824   -0.0001259     -0.02013
1.0000:     77.167
inc:    -0.003484    -0.000754   -0.0009127    -4.27e-09   -0.0001327
1.0000:     77.164
inc:   -2.614e-06   -1.812e-07    -2.23e-07   -1.594e-17   -5.725e-09
1.0000:     77.164
   187.597:            1       -1.269       -1.171       -1.301       -1.782
$par
[1]  1.000000 -1.269023 -1.170763 -1.301405 -1.782279

At each call to devf, pirls takes a few steps and then returns the initial values. Maybe bobyqa is looking into the environment of devf differently than optim or Nelder-Mead?

stevencarlislewalker commented 11 years ago

If I run bobyqa with debug(nloptr), and then immediately inspect the contents of the environment, I first get:

ls(environment(eval_f))
[1] "control" "fn"      "fun"     "lower"   "nl.info" "opts"    "upper"  
[8] "x0"

eval_f seems to be what is used as the objective function inside nloptr. For example,

Browse[2]> eval_f
function (x) 
fun(x, ...)
<environment: 0x10a29f670>

and

Browse[2]> head(environment(eval_f)$fun)                                 
1 function (thetabeta)                
2 {                                   
3     Lambdat@x[] <<- thfun(thetabeta)
4     LtZt <- Lambdat %*% Zt          
5     beta[] <<- thetabeta[betaind]   
6     offb <- offset + X %*% beta

However, the environment of eval_f doesn't seem to have what we need,

Browse[2]> ls(environment(eval_f))
[1] "control" "fn"      "fun"     "lower"   "nl.info" "opts"    "upper"  
[8] "x0"

whereas that of fun does,

Browse[2]> ls(environment(environment(eval_f)$fun))
 [1] "aic"        "beta"       "betaind"    "eta"        "L"         
 [6] "Lambdat"    "linkinv"    "mu"         "muEta"      "n"         
[11] "nAGQ"       "npirls"     "nth"        "offset"     "p"         
[16] "q"          "sqDevResid" "thfun"      "tol"        "u"         
[21] "variance"   "verbose"    "weights"    "X"          "y"         
[26] "Zt"

So my hypothesis is that bobyqa (via nloptr) is looking into an environment that doesn't contain the objects being set by <<- in pirls. It is just looking into environment(eval_f), which doesn't have everything it needs.

I'm somewhat surprised by this, because I would think that bobyaq would be smarter. So I'm probably missing something.

dmbates commented 11 years ago

@stevencarlislewalker The use of eval_f to wrap the evaluation of fun is common to all the optimization functions. Optimizers usually take a ... argument in which additional arguments to fun can be passed. As you can see, the wrapper, eval_f, packages the function and the ... then calls fun(x, ...) just so the ... gets passed down to fun.

Having the matrices and other objects defining the model structure in environment(fun) is not a mistake in this context.

Having said that, I still don't know why bobyqa croaks after a couple of evaluations. I believe that theta and beta are no longer saved in the environment. theta is used to update Lambdat and beta is used to create the offset + X %*% beta but they are not saved themselves. My reasoning is that they would be available as par from the optimizer and it is usually more reliable to take the value returned from the optimizer than to take the thetabeta value used at the last function evaluation. The last function evaluation should be at a parameter value that is close to the optimum but it doesn't have to be at the optimum.

stevencarlislewalker commented 11 years ago

I figured I was missing something, so thanks for clearing that up Doug. I'm slightly confused though. Shouldn't this line in pirls.R at least save beta?

beta[] <<- thetabeta[betaind]

Edit: I just tried not saving beta and this had no effect.

stevencarlislewalker commented 11 years ago

I agree with you that it shouldn't be necessary to save theta and beta. They come in through par/thetabeta, and then everything else gets updated.

stevencarlislewalker commented 11 years ago

When debugging bobyqa and seeing what nloptr is returning, we find an interesting error message:

> bobyqa(c(1,beta0), devf, lower=c(0,rep.int(-Inf,length(beta0))))$message

NLopt solver status: -2 ( NLOPT_INVALID_ARGS: Invalid arguments (e.g. lower 
bounds are bigger than upper bounds, an unknown algorithm was specified, 
etcetera). )

But the lower bounds are definitely not bigger than the upper bounds, which are all Inf. I also don't get why it complains that 'an unknown algorithm was specified'.

I think this must be a generic error message that is misleading about the true problem. None of the suggested problems can be correct, can they?

stevencarlislewalker commented 11 years ago

There is nothing wrong with the bounds:

> bobyqa(c(1,beta0), devf)$message
[1] "NLOPT_INVALID_ARGS: Invalid arguments (e.g. lower bounds are bigger than upper bounds, an unknown algorithm was specified, etcetera)."

dmbates commented 11 years ago

That error message had me puzzled too. It might be worthwhile checking if the minqa implementation of bobyqa also produces such an error.

I took it to mean that something was wrong with the evaluation of the function at the initial values, perhaps that it didn't produce a good starting pattern for the initial step. Most optimizers do something like a step in each coordinate direction to get an initial step direction. But none of the other optimizers seem to have problems with this.

stevencarlislewalker commented 11 years ago

minqa bobyqa works:

> bobyqa(c(1,beta0), devf, lower=c(0,rep.int(-Inf,length(beta0))))[c("par","objective")]
$par
[1]  0.6422594 -1.3985243 -0.9923337 -1.1286733 -1.5803150

This can't be a problem on our end, can it?

dmbates commented 11 years ago

I don't think it is a problem at our end and I'm really not prepared to delve into the nloptr and nloptwrap packages to discover what might be going on. I will probably create a Julia implementation using Steve Johnson's wrapper for NLopt to do the optimization so we can see if it is something about the NLopt implmentation. Somehow I doubt an error this obvious would have escaped their testing, though.

bbolker commented 11 years ago

For those of us who haven't quite caught up but want to look: what's the minimal example for getting to the crux of the problem? I have checked out and installed the db/radicalrewrite branch ... when I source("tests/cbpp.R") I get an error about a cc object which hasn't been defined ... when I run Steven's example I get something a little different:

> bobyqa(c(1,beta0), devf, lower=c(0,rep.int(-Inf,length(beta0))))[c("par","objective")]
$par
[1]  1.000000 -1.269023 -1.170763 -1.301405 -1.782279

$<NA>
NULL

PS the nloptwrap package at least is not very complicated, I would be willing to dig into it, if someone tells me where to start looking ...

dmbates commented 11 years ago

The db/radicalrewrite branch has been folded into the master branch so you are better off just using the master branch from now on. The trick is that it needs the flexLambda branch of lme4 to be installed so that the thfun is passed properly. I'm currently working on generating the equivalent of reTrms from the original formula without all the machinations of the mkReTrms function.

dmbates commented 11 years ago

The reason I mentioned the last bit is because it will allow us to bypass the flexLambda branch.

stevencarlislewalker commented 11 years ago

@bbolker, I think the cc name was a typo. It should be beta0, and is currently on the master branch. Can you reproduce on the master branch?

bbolker commented 11 years ago

tests/cbpp.R looks OK on the master branch, runs with flexLambda branch. Now what should I be looking at? Can someone commit/push a more isolated test of what is apparently going wrong?

I still seem to get the result listed above with

$par
[1]  1.000000 -1.269023 -1.170763 -1.301405 -1.782279

$<NA>
NULL

...

stevencarlislewalker commented 11 years ago

Closing because pirls seems to be working now except with nloptwrap bobyqa, and this smaller issue has been moved to issue #4.

stevencarlislewalker commented 11 years ago

The test case for the bobyqa problem is here: https://github.com/lme4/lme4pureR/issues/4#issuecomment-23292147

lme4 / lme4pureR

Updating the vector of random effects in the PIRLS iterations #2