Extremely unstable convergence even for a simple problem

tamas-ferenci commented 5 years ago

When working on this, I found something quite strange.

Let's take the most basic example from the vignette:

library(nlmixr)

one.compartment.saem <- function() {
    ini({
        tka <- .5   # Log Ka
        tcl <- -3.2 # Log Cl
        tv <- -1    # Log V
        eta.ka ~ 1
        eta.cl ~ 2
        eta.v ~ 1
        add.err <- 0.1
    })
    model({
        ka <- exp(tka + eta.ka)
        cl <- exp(tcl + eta.cl)
        v <- exp(tv + eta.v)
        d/dt(depot) = -ka * depot
        d/dt(center) = ka * depot - cl / v * center
        cp = center / v 
        cp ~ add(add.err)
    })
}

Now, let's run the model, and then re-run it with the only change being that the starting values of the parameters are set to the estimated results of the first run:

fit <- nlmixr(one.compartment.saem, theo_sd, est="saem")
fit$theta
fit2<-nlmixr(update(fit,tka=fit$theta["tka"],tcl=fit$theta["tcl"],
    tv=fit$theta["tv"]),theo_sd,"saem")
fit2$theta

So what we see is that there are differences above an order of magnitude (!) just by this simple change in the initial value!

This would of course mean that the results are practically useless, which is hard to believe (as this is a horribly simple model! if there are convergence problems here, then there are everywhere), so I think I'm making a mistake somewhere, but I don't where...

mattfidler commented 5 years ago

Hi @tamas-ferenci

There are many reason for convergence problems; Some of the most common are:

Inadequate amount of data to characterize the system completely (With only 12 subjects it is likely this is a contributing factor)
Errors in the data itself (check theo_sd and see if you can find any; I can see at least one. Its such an old data-set it wouldn't have passed modern checking)
Practically or completely non-identifiable parameters; You can check identifiable parameters with tools like popEd. I started an incomplete interface with this tool.
Insensitive parameters, ie large changes only change the curve slightly; (On a log-scale this is actually common over a certain threshold).

In general all these possible problem areas all lead to a difficult likelihood surface with many local minima. This is not a unique problem.

Any of the above can lead to these problems. It doesn't actually mean you are making a mistake, nor that the results are practically useless either. The likelihood values can be used do discern the best model parameter estimates to make predictions from. If there are more than one model that is similar in AIC, you can also use model averaging techniques to produce a good simulation of future events.

mattfidler commented 5 years ago

Note, that with your example and the latest nlmixr I have the following results

``` r one.compartment.saem <- function() { ini({ tka <- .5 # Log Ka tcl <- -3.2 # Log Cl tv <- -1 # Log V eta.ka ~ 1 eta.cl ~ 2 eta.v ~ 1 add.err <- 0.1 }) model({ ka <- exp(tka + eta.ka) cl <- exp(tcl + eta.cl) v <- exp(tv + eta.v) d/dt(depot) = -ka * depot d/dt(center) = ka * depot - cl / v * center cp = center / v cp ~ add(add.err) }) } library(nlmixr) ## Using SAEM fit <- nlmixr(one.compartment.saem, theo_sd, est="saem", control=list(print=0), table=list(cwres=TRUE)) #> Compiling RxODE equations...done. #> Calculating covariance matrix #> [====|====|====|====|====|====|====|====|====|==== #> Calculating residuals/tables #> done. print(AIC(fit)) #> [1] 3005.811 fit2 <- fit %>% nlmixr(est="saem", control=list(print=0), table=list(cwres=TRUE)) #> Calculating covariance matrix #> [====|====|====|====|====|====|====|====|====|==== #> Calculating residuals/tables #> done. print(AIC(fit2)) #> [1] 367.4552 one.compartment.saem <- function() { ini({ tka <- .5 # Log Ka tcl <- -3.2 # Log Cl tv <- -1 # Log V eta.ka ~ 1 eta.cl ~ 2 eta.v ~ 1 add.err <- 0.1 }) model({ ka <- exp(tka + eta.ka) cl <- exp(tcl + eta.cl) v <- exp(tv + eta.v) d/dt(depot) = -ka * depot d/dt(center) = ka * depot - cl / v * center cp = center / v cp ~ add(add.err) }) } library(nlmixr) ## Using SAEM fitF <- nlmixr(one.compartment.saem, theo_sd, est="focei", control=list(print=0), table=list(cwres=TRUE)) #> Theta reset #> Theta reset #> Theta reset #> Theta reset #> Theta reset #> Calculating covariance matrix #> [====|====|====|====|====|====|====|====|== S matrix calculation failed; Switch to R-matrix covariance. #> ==|====done #> Calculating residuals/tables #> done. #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : ETAs were reset to zero during optimization; (Can #> control by foceiControl(resetEtaP=.)) #> Warning in (function (uif, data, est = NULL, control #> = list(), ..., sum.prod = FALSE, : mu-referenced #> Thetas were reset during optimization; (Can control by #> foceiControl(resetThetaP=.,resetThetaCheckPer=.,resetThetaFinalP=.)) #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : Initial ETAs were nudged; (Can control by #> foceiControl(etaNudge=.)) #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : Hessian (S) matrix seems singular; Using pseudo-inverse #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : R matrix non-positive definite but corrected by R = #> sqrtm(R%*%R) #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : Using R matrix to calculate covariance, can check #> sandwich or S matrix with $covRS and $covS #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : Gradient problems with initial estimate and covariance; #> see $scaleInfo #> Warning in (function (uif, data, est = NULL, control = list(), ..., sum.prod = FALSE, : Tolerances (atol/rtol) were temporarily reduced for some difficult ODE solving during the optimization. #> Consider reducing sigdig/atol/rtol changing initial estimates or changing the structural model. print(AIC(fitF)) #> [1] 648.7298 fit2F <- fit %>% nlmixr(est="focei", control=list(print=0), table=list(cwres=TRUE)) #> Calculating covariance matrix #> [====|====|====|====|====|====|====|====|====|====done #> Calculating residuals/tables #> done. #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : Initial ETAs were nudged; (Can control by #> foceiControl(etaNudge=.)) #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : ETAs were reset to zero during optimization; (Can #> control by foceiControl(resetEtaP=.)) #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : Gradient problems with initial estimate and covariance; #> see $scaleInfo print(AIC(fit2F)) #> [1] 370.3616 ``` ^{Created on 2019-07-08 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)}

This shows fit2 is better with the updated data. Notice that the nlmixr vignette initial estimates have been updated too.

mattfidler commented 5 years ago

The other possibility, of course, is that there is a local minimia that the algorithm found. By restarting with the same initial estimates you can try to avoid this. Both SAEM and FOCEi algorithms change the search path based on new initial conditions and can cause nlmixr to find new and possibly better solutions. NONMEM has a similar option, I am unsure about Monolix, though.

mattfidler commented 5 years ago

This is also likely the case for theo_sd. Notice running from the same initial estimates a third time gives similar parameter estimates (and objective function values).

``` r one.compartment.saem <- function() { ini({ tka <- .5 # Log Ka tcl <- -3.2 # Log Cl tv <- -1 # Log V eta.ka ~ 1 eta.cl ~ 2 eta.v ~ 1 add.err <- 0.1 }) model({ ka <- exp(tka + eta.ka) cl <- exp(tcl + eta.cl) v <- exp(tv + eta.v) d/dt(depot) = -ka * depot d/dt(center) = ka * depot - cl / v * center cp = center / v cp ~ add(add.err) }) } library(nlmixr) ## Using SAEM fit <- nlmixr(one.compartment.saem, theo_sd, est="saem", control=list(print=0)) #> Compiling RxODE equations...done. #> Calculating covariance matrix #> [====|====|====|====|====|====|====|====|====|==== #> Calculating residuals/tables #> done. print(AIC(fit)) #> Calculating -2LL by Gaussian quadrature (nnodes=3,nsd=1.6) #> [====|====|====|====|====|====|====|====|====|==== #> [1] 483.1579 fit2 <- fit %>% nlmixr(est="saem", control=list(print=0)) #> Calculating covariance matrix #> [====|====|====|====|====|====|====|====|====|==== #> Calculating residuals/tables #> done. print(AIC(fit2)) #> Calculating -2LL by Gaussian quadrature (nnodes=3,nsd=1.6) #> [====|====|====|====|====|====|====|====|====|==== #> [1] 505.3644 fit3 <- fit2 %>% nlmixr(est="saem", control=list(print=0)) #> Calculating covariance matrix #> [====|====|====|====|====|====|====|====|====|==== #> Calculating residuals/tables #> done. ## AIC of fit vs fit2 using guass quadrature shows fit is better than fit2 addCwres(fit) #> Calculating residuals/tables #> done. #> Registered S3 methods overwritten by 'broom.mixed': #> method from #> augment.lme broom #> augment.merMod broom #> glance.lme broom #> glance.merMod broom #> glance.stanreg broom #> tidy.brmsfit broom #> tidy.gamlss broom #> tidy.lme broom #> tidy.merMod broom #> tidy.rjags broom #> tidy.stanfit broom #> tidy.stanreg broom #> Warning: 'tidy.logical' is deprecated. #> See help("Deprecated") #> Warning: 'tidy.logical' is deprecated. #> See help("Deprecated") #> Warning: Unknown or uninitialised column: 'term'. #> Warning: Unknown or uninitialised column: 'term'. #> Error in fix.by(by.y, y): 'by' must specify a uniquely valid column addCwres(fit2) #> Calculating residuals/tables #> done. #> Warning: 'tidy.logical' is deprecated. #> See help("Deprecated") #> Warning: 'tidy.logical' is deprecated. #> See help("Deprecated") #> Warning: Unknown or uninitialised column: 'term'. #> Warning: Unknown or uninitialised column: 'term'. #> Error in fix.by(by.y, y): 'by' must specify a uniquely valid column addCwres(fit3) #> Calculating residuals/tables #> done. #> Warning: 'tidy.logical' is deprecated. #> See help("Deprecated") #> Warning: 'tidy.logical' is deprecated. #> See help("Deprecated") #> Warning: Unknown or uninitialised column: 'term'. #> Warning: Unknown or uninitialised column: 'term'. #> Error in fix.by(by.y, y): 'by' must specify a uniquely valid column print(AIC(fit)) #> [1] 3005.811 print(AIC(fit2)) #> [1] 367.4552 print(AIC(fit3)) #> [1] 367.3361 ## AIC of fit vs fit2 using FOCEi shows fit2 is better than fit ## Goodness of fits of fit2 are better one.compartment.saem <- function() { ini({ tka <- .5 # Log Ka tcl <- -3.2 # Log Cl tv <- -1 # Log V eta.ka ~ 1 eta.cl ~ 2 eta.v ~ 1 add.err <- 0.1 }) model({ ka <- exp(tka + eta.ka) cl <- exp(tcl + eta.cl) v <- exp(tv + eta.v) d/dt(depot) = -ka * depot d/dt(center) = ka * depot - cl / v * center cp = center / v cp ~ add(add.err) }) } library(nlmixr) ## Using SAEM fitF <- nlmixr(one.compartment.saem, theo_sd, est="focei", control=list(print=0), table=list(cwres=TRUE)) #> Theta reset #> Theta reset #> Theta reset #> Theta reset #> Theta reset #> Calculating covariance matrix #> [====|====|====|====|====|====|====|====|== S matrix calculation failed; Switch to R-matrix covariance. #> ==|====done #> Calculating residuals/tables #> done. #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : ETAs were reset to zero during optimization; (Can #> control by foceiControl(resetEtaP=.)) #> Warning in (function (uif, data, est = NULL, control #> = list(), ..., sum.prod = FALSE, : mu-referenced #> Thetas were reset during optimization; (Can control by #> foceiControl(resetThetaP=.,resetThetaCheckPer=.,resetThetaFinalP=.)) #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : Initial ETAs were nudged; (Can control by #> foceiControl(etaNudge=.)) #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : Hessian (S) matrix seems singular; Using pseudo-inverse #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : R matrix non-positive definite but corrected by R = #> sqrtm(R%*%R) #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : Using R matrix to calculate covariance, can check #> sandwich or S matrix with $covRS and $covS #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : Gradient problems with initial estimate and covariance; #> see $scaleInfo #> Warning in (function (uif, data, est = NULL, control = list(), ..., sum.prod = FALSE, : Tolerances (atol/rtol) were temporarily reduced for some difficult ODE solving during the optimization. #> Consider reducing sigdig/atol/rtol changing initial estimates or changing the structural model. print(AIC(fitF)) #> [1] 648.7298 fit2F <- fit %>% nlmixr(est="focei", control=list(print=0), table=list(cwres=TRUE)) #> Calculating covariance matrix #> [====|====|====|====|====|====|====|====|====|====done #> Calculating residuals/tables #> done. #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : Initial ETAs were nudged; (Can control by #> foceiControl(etaNudge=.)) #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : ETAs were reset to zero during optimization; (Can #> control by foceiControl(resetEtaP=.)) #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : Gradient problems with initial estimate and covariance; #> see $scaleInfo print(AIC(fit2F)) #> [1] 370.3616 fit3F <- fit2 %>% nlmixr(est="focei", control=list(print=0), table=list(cwres=TRUE)) #> Calculating covariance matrix #> [====|====|====|====|====|====|====|====|====|====done #> Calculating residuals/tables #> done. #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : Initial ETAs were nudged; (Can control by #> foceiControl(etaNudge=.)) #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : ETAs were reset to zero during optimization; (Can #> control by foceiControl(resetEtaP=.)) #> Warning in (function (uif, data, est = NULL, control = list(), ..., #> sum.prod = FALSE, : Gradient problems with initial estimate and covariance; #> see $scaleInfo print(AIC(fit3F)) #> [1] 366.9861 print(fixef(fit2F)) #> tka tcl tv add.err #> -2.4673205 1.0032213 0.5126326 0.6977655 print(fixef(fit3F)) #> tka tcl tv add.err #> -2.4651411 1.0041237 0.5281811 0.7029780 print(fixef(fit2)) #> tka tcl tv add.err #> -2.4731885 0.9975852 0.5058521 0.7040359 print(fixef(fit3)) #> tka tcl tv add.err #> -2.4865205 0.9895686 0.4888271 0.7081175 ``` ^{Created on 2019-07-08 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)}

Hence in any mixed-effect model it is best to rerun the model with the final estimate to make sure it is stable.

tamas-ferenci commented 5 years ago

Thanks @mattfidler for such detailed response!

I also realized that re-running a third time doesn't change the estimates, but this actually contributed to my belief that I am making a mistake: SAEM is a very sophisticated algorithm and I thought only a very primitive one would stop at a point, where, if restarted from the exact same point converges to a very-very substantially better fit (see the AICs).

(This belief was exacerbated by the fact that I was under the impression that this model and the theo_sd is extremely basic and totally well-defined, so if anything goes smoothly, this should.)

So one final question remains: do you think that - based on these findings - I should make a habit from re-running every model with nlmixr two times in the above fashion to make sure estimates no longer change...?

mattfidler commented 5 years ago

Hi @tamas-ferenci ,

My practice has always been to restart the models to make sure they are stable. I did this in NONMEM before I started working on nlmixr, and don't expect the algorithms to change enough to make this step unnecessary. I'm unsure if everyone does this; In a phoenix course I took they also encouraged the practice even with their qprm algorithm (another EM algorithm). I wasn't encouraged to do this at a monolix training course, though I'm sure that it is still likely a good practice.

However, in this case a simple plot(fit) shows the initial model isn't doing too well and something needs to be done to fix it. In general goodness of fit plots are more informative than log likelihoods/AICs etc.

Of course you could also look at the traceplots to see if convergence has been achieved, and play with the settings there (nBurn and nEm in particular). But there is no guarantee the model converges to the same solution.

mattfidler commented 5 years ago

To be a little more clear, I didn't always do the step for covariate selection, but otherwise my statement is correct.

tamas-ferenci commented 5 years ago

Clear, thank you very much again!

nlmixrdevelopment / nlmixr

Extremely unstable convergence even for a simple problem #195