Optim vs Nlimb - Githubissues

MonHansen commented 9 years ago

Is there a difference between Optim and Nlminb? I have noticed that Nlminb sometimes can optimise when Optim can't. If this happens are the results of Nlminb trustworthy?

bbolker commented 9 years ago

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1

On 15-01-08 05:35 AM, MonHansen wrote:

Is there a difference between Optim and Nlminb? I have noticed that Nlminb sometimes can optimise when Optim can't. If this happens are the results of Nlminb trustworthy?

--- Reply to this email directly or view it on GitHub: https://github.com/kaskr/adcomp/issues/73

General-purpose continuous optimization is just plain difficult, even when one has a great tool like TMB/adcomp. You can dig into the documentation of each function, or the code, to see exactly what algorithms they implement ...

I would certainly check my results even more carefully to see that they made sense if one optimizer failed, but this does happen very frequently for difficult models. Figuring out exactly why can be very difficult. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux)

iQEcBAEBAgAGBQJUro7vAAoJEOCV5YRblxUHudsH/3CdoOB0rePQ7kZfcrj+WSTh 4wxVd2vZ049V8/OnLUa06+vb3P5Cz4PlMTnIfTDm7ntlkM3rOfJ6ldvgJ904SQ+l YvvThO46I+vvaIbU897s4JxFHhdAV5ZtND9xkLV/QNfVZVklAqNNOQVw5qxjrN/4 /VMLLOVgQ+fQfn90alhId25T+9x8nNcCMXsAL9Rqe0VEM+dXWJDQvt6NdES1QxUW +bfbQYHA8I+8TA7YbAr//UaJEppf5eEXMJ79oR76VgZH3Vplp8XrJjRNnHpfi9Lo p4/wBpLXEkQ484OYLUn5WvEBZLO8ZmfHNfjOLQvWvRSLODj49wv/deSMWjZjTxg= =A3nW -----END PGP SIGNATURE-----

John-R-Wallace-NOAA commented 9 years ago

Optimization wrapper package, optimx, and the trust optimization function

The optimx package is a wrapper function for a large number of optimization functions, read the package help for more information.

Running the TMB's 'simple' example with all.methods=T gives:

require(optimx)
optimx(obj$par, function(x) as.numeric(obj$fn(x)), obj$gr, control = list(all.methods=T))

Not improving much - will try early exit...PD hess?: FALSE

	beta	beta.1	logsdu	logsd0	value	fevals	gevals	niter	convcode	kkt1	kkt2	xtimes
BFGS	53.2331627	30.753459	2.888532	2.036139	9.120328e+02	115	42	NA	0	TRUE	TRUE	1.10
CG	3.9827125	1.288637	3.550401	2.030498	9.825627e+02	302	101	NA	1	FALSE	FALSE	2.06
Nelder-Mead	0.7502629	5.166633	3.603549	2.039200	9.902448e+02	183	NA	NA	0	FALSE	FALSE	0.84
L-BFGS-B	53.2303152	30.751314	2.888707	2.036301	9.120327e+02	44	44	NA	0	TRUE	TRUE	0.43
nlm	53.2308187	30.751029	2.888713	2.036309	9.120327e+02	NA	NA	50	0	TRUE	TRUE	1.75
nlminb	53.2308968	30.751055	2.888714	2.036310	9.120327e+02	31	27	26	0	TRUE	TRUE	0.30
spg	53.2297362	30.749904	2.888714	2.036309	9.120327e+02	1514	NA	1201	0	TRUE	TRUE	14.89
ucminf	53.2308941	30.751064	2.888714	2.036309	9.120327e+02	30	30	NA	0	TRUE	TRUE	0.31
Rcgmin	NA	NA	NA	NA	8.988466e+307	NA	NA	NA	9999	NA	NA	0.26
Rvmmin	53.2308922	30.751043	2.888714	2.036309	9.120327e+02	98	42	NA	0	TRUE	TRUE	1.58
newuoa	53.2309129	30.751054	2.888714	2.036309	9.120327e+02	1250	NA	NA	0	TRUE	TRUE	5.99
bobyqa	53.2308987	30.751106	2.888714	2.036309	9.120327e+02	1079	NA	NA	0	TRUE	TRUE	5.62
nmkb	53.2331585	30.754578	2.888727	2.036300	9.120327e+02	316	NA	NA	0	TRUE	TRUE	1.47
hjkb	0.1000000	0.100000	0.100000	0.100000	3.909499e+04	1	NA	0	9999	NA	NA	0.00

There were 50 or more warnings (use warnings() to see the first 50)

The 'trust' package has a single function 'trust' that "carries out a minimization or maximization of a function using a trust region algorithm".

The trust function requires a hessian, which is not yet provided by TMB for models that contain a random component.

However this can be overcome for comparison purposes to nlimb() by using optimHess() to provide the Hessian:

obj$he <- function(x=obj$env$last.par.best[-obj$env$random], disable.tracing = TRUE, ...) { 

         if(disable.tracing) {
            tracemgc.old <- obj$env$tracemgc; obj$env$tracemgc <- FALSE; on.exit(obj$env$tracemgc <- tracemgc.old)
            inner.trace.old <- obj$env$inner.control$trace; obj$env$inner.control$trace <- FALSE; on.exit(obj$env$inner.control$trace <- inner.trace.old, add = TRUE)
            silent.old <- obj$env$silent; obj$env$silent <- TRUE; on.exit(obj$env$silent <- silent.old, add = TRUE)
         }

         optimHess(x, obj$fn, obj$gr, ...) 
      }

obj$he()

beta	beta	logsdu	logsd0
beta	0.117005092	-0.040568074	-0.009937551
beta	-0.040568074	0.116896783	-0.009645631
logsdu	-0.009937551	-0.009645631	190.974821042
logsd0	0.009937537	0.009645610	16.527890613

Comparison with all tracing turned off:

obj$env$tracemgc <- FALSE obj$env$inner.control$trace <- FALSE obj$env$silent <- TRUE

system.time(optim(obj$par, obj$fn, obj$gr, control = list(parscale=obj$par*0+1e-1,trace=0), hessian = F)) #optim trace set to zero also

user	system	elapsed
0.63	0.04	0.68

system.time(opt <- nlminb(obj$par, obj$fn, obj$gr))

user	system	elapsed
0.31	0.00	0.32

system.time(opt <- nlminb(obj$par, obj$fn, obj$gr, obj$he))

user	system	elapsed
0.86	0.02	0.87

require(trust)
system.time(opt <- trust(function(x) { list(value = obj$fn(x), gradient = obj$gr(x), hessian = obj$he(x)) }, obj$par, 1, 10))

user	system	elapsed
0.86	0.00	0.86

Note that in other examples I tried the gradient argument for trust() needed to be changed to a vector:

system.time(opt <- trust(function(x) { list(value = obj$fn(x), gradient = as.vector(obj$gr(x)), hessian = obj$he(x)) }, obj$par, 1, 10))

Numerical Optimization in R: Beyond optim: http://www.jstatsoft.org/v60/i01/paper

There is additional comparison information between optim and nlminb here: http://stats.stackexchange.com/questions/9535/when-should-i-not-use-rs-nlm-function-for-mle

kaskr / adcomp

Optim vs Nlimb #73