Checking convergence: dependence on initial values

nlmixrdevelopment / nlmixr

nlmixr: an R package for population PKPD modeling

https://nlmixrdevelopment.github.io/nlmixr/

GNU General Public License v2.0

115 stars 45 forks source link

Checking convergence: dependence on initial values #181

Closed tamas-ferenci closed 5 years ago

tamas-ferenci commented 5 years ago

I am somewhat worried about the convergence of an nlmixr fit (some of the estimated values are suspiciously close to their initial values). I thought of investigating this by re-running the fit with several different initial values, to see how stable the result is, i.e. whether the estimated effects are the same more or less independently of the initial value.

Question 1: Is it the sound approach to diagnose such (potential) convergence issues? It seems a bit rudimentary approach... (I can imagine that someone who has a knowledge on how nlmixrs estimation works can provide a much more sophisticated way to diagnose convergence.)

Question 2: If so, how to realize this investigation? This is the best with which I could come up, but it is very-very-very non-elegant (I think):

library( nlmixr )
library( data.table )
library( lattice )

one.compartment.saem.ka <- function( kainit ) {
  paste0( "function() {
          ini({
          tka <- ", kainit, "   # Log Ka
          tcl <- -3.2 # Log Cl
          tv <- -1    # Log V
          eta.ka ~ 1
          eta.cl ~ 2
          eta.v ~ 1
          add.err <- 0.1
          })
          model({
          ka <- exp(tka + eta.ka)
          cl <- exp(tcl + eta.cl)
          v <- exp(tv + eta.v)
          d/dt(depot) = -ka * depot
          d/dt(center) = ka * depot - cl / v * center
          cp = center / v 
          cp ~ add(add.err)
          })
}" )
}

fits <- lapply( seq( -1, -0.1, 0.1 ), function( kainit ) {
  eval( parse( text = paste0( "one.compartment.saem <- ",
                     one.compartment.saem.ka( kainit ) ) ) )
  nlmixr( one.compartment.saem, theo_sd, est="saem" )
} )

xyplot( value ~ kainit | variable, type = "l", scales = list( relation = "free" ),
        data = melt( data.table( kainit = seq( -1, -0.1, 0.1 ),
                            do.call( rbind, lapply( fits, function( x ) x$theta ) ) ),
                            id.vars = "kainit" ) )

mattfidler commented 5 years ago

Hi @tamas-ferenci

I think the approach is sound, there are similar approaches.

Another approach is to try say "focei" or "nlme" or any other methods to see what they produce. If you produce CWRES, the FOCEi objective functions in nlmixr should be comparible.

As far as a more elegant solution, I think you should consider model piping:

https://nlmixrdevelopment.github.io/nlmixr/articles/modelPiping.html

tamas-ferenci commented 5 years ago

Dear @mattfidler ,

Awesome, thank you very much!

I think it basically answers everything, I only have two minor questions, if your time permits to answer them:

"If you produce CWRES, the FOCEi objective functions in nlmixr should be comparible." I didn't really understand that, sorry...! I understand that addCwres(fit) adds CWRES, but what should I do afterwards for convergence diagnostics? What should be comparable to what...?
Is this "posthoc" vs. "full" estimation thing documented somewhere...? It's surely my fault, but I couldn't find it... In the meantime, my understanding is that
- nlmixr(update(fit,tka=0.5),est="posthoc") Changes the estimated value of tka to 0.5 leaving the other estimated values intact
- nlmixr(update(fit,tka=0.5),est="saem") Re-estimates the whole model with the inital value of tka set to 0.5 (I hope this is indeed about the initial value!)
- nlmixr(update(fit,tka=fix(0.5)),est="posthoc") Re-estimates the whole model with the estimated value of tka set fixed to 0.5
- nlmixr(update(fit,tka=fix),est="saem") Re-estimates the whole model with the estimated value of tka set fixed to the estimated value in fit

If this understanding is correct, then we need the second from the above list, so the following code should be a much more elegant, but otherwise identical version of my original one:

fit <- nlmixr( one.compartment, theo_sd, est = "saem" )

fits <- lapply( seq( -1, -0.1, 0.1 ), function( kainit )
     nlmixr( update( fit, tka = kainit ), theo_sd, est="saem" ) )

xyplot( value ~ kainit | variable, type = "l", scales = list( relation = "free" ),
        data = melt( data.table( kainit = seq( -1, -0.1, 0.1 ),
                                 do.call( rbind, lapply( fits, function( x ) x$theta ) ) ),
                     id.vars = "kainit" ) )

however it gives an Error in eval(.call[[.n]]) : object 'kainit' not found error, which (again if my logic is correct!) may be because of the .call <- match.call(expand.dots = TRUE)[-c(1:2)] line of the update function which will give back kainit (symbolically), instead of its actual value.

mattfidler commented 5 years ago

"If you produce CWRES, the FOCEi objective functions in nlmixr should be comparible." I didn't really understand that, sorry...! I understand that addCwres(fit) adds CWRES, but what should I do afterwards for convergence diagnostics? What should be comparable to what...?

If you use CWRES it requires a FOCEi approximation. Once the machinery is in place to calculate the CWRES it is trivial to calculate the FOCEi approximation to the likelihood function. Therefore, the FOCEi based likelihood can be compared between models with different estimation methods.

Is this "posthoc" vs. "full" estimation thing documented somewhere...? It's surely my fault, but I couldn't find it... In the meantime, my understanding is that

Not sure if it made the documentation. If so, it isn't that easy to find. It is in a table in some of the nlmixr presentations at page. Please feel free to suggest something here. Usually the nlmixr documentation is in the form of a vignette.

nlmixr(update(fit,tka=0.5),est="posthoc") Changes the estimated value of tka to 0.5 leaving the other estimated values intact

To be clear, this updates the population parameter tka and re-estimates all the individiual deviations from the new model (the so-called inner problem). You can actually take etas from another model, like NONMEM for instance, and calculate nlmixr's objective function if you wish, but that requires an additional step that definately isn't documented anywhere. I was going to and an example soon, though.

nlmixr(update(fit,tka=0.5),est="saem") Re-estimates the whole model with the inital value of tka set to 0.5 (I hope this is indeed about the initial value!)

Yes this is what is done here; The other parameters remain the same.'

nlmixr(update(fit,tka=fix(0.5)),est="posthoc") Re-estimates the whole model with the estimated value of tka set fixed to 0.5

No this only does a posthoc evaluation, that is given the new model where tka is 0.5 give the posthoc estimate of the problem. The same as described in the prior question. If you meant:

nlmixr(update(fit,tka=fix(0.5)),est="focei")

Yes this re-estimates the whole model with the estimated value of tka fixed to 0.5

nlmixr(update(fit,tka=fix),est="saem") Re-estimates the whole model with the estimated value of tka set fixed to the estimated value in fit

Yes, this is correct.

Note:

Since you do not use the pipeline operator, %>% , I am not clear if you would need to provide the dataset in the nlmixr function. It may work or it may not. I did some tests to see if you are in a pipeline for some of the update functions.

If this understanding is correct, then we need the second from the above list, so the following code should be a much more elegant, but otherwise identical version of my original one:

Indeed it should work.

however it gives an Error in eval(.call[[.n]]) : object 'kainit' not found error, which (again if my logic is correct!) may be because of the .call <- match.call(expand.dots = TRUE)[-c(1:2)] line of the update function which will give back kainit (symbolically), instead of its actual value.

This part is a little more tricky. At first glance what you say is true, but what is really happening is a byproduct of non-standard/lazy evalutaion and environmental access.

If you run the same code outside of a function, it works correctly. If you run inside a testthat script it runs correctly (which changes the environment in a similar way as a function). However, if you run update inside a function, according to your analysis it doesn't work at this time.

This non-standard evalution is powerful and tricky to get working in all scenarios.

In this situation there are two layers of non-standard evaluation to deal with, update and then when the update function calls ini.

When I get time I can try the following code (or you can try if you have more time than I do, since I will be giving courses and attending PAGE this week):

fit <- nlmixr( one.compartment, theo_sd, est = "saem" )

fits <- lapply( seq( -1, -0.1, 0.1 ), function( kainit )
     nlmixr( ini( fit, tka = kainit ), theo_sd, est="saem" ) )

xyplot( value ~ kainit | variable, type = "l", scales = list( relation = "free" ),
        data = melt( data.table( kainit = seq( -1, -0.1, 0.1 ),
                                 do.call( rbind, lapply( fits, function( x ) x$theta ) ) ),
                     id.vars = "kainit" ) )

Which should cut down on one layer of non-standard evaluation.

tamas-ferenci commented 5 years ago

Dear @mattfidler ,

Awesome, first of all, thank you very much for such detailed answer! Let me reply to a few points.

If you use CWRES it requires a FOCEi approximation. Once the machinery is in place to calculate the CWRES it is trivial to calculate the FOCEi approximation to the likelihood function. Therefore, the FOCEi based likelihood can be compared between models with different estimation methods.

Just to make sure I understand you correctly: addCwres is not needed because I want to do anything with CWRES itself but rather because it will change OBJF not calculated fit to OBJF by FOCEi approximation fit and then (and only then!) will I have fit$objf that I can compare between different estimation methods...?

Not sure if it made the documentation.

I have to admit that this "posthoc" story is not entirely clear to me from the theoretical aspect to start with, so I think I first have to find some good textbook or paper on this... I appreciate if you can suggest one, but that's really my task :)

Thanks for the explanation of the four different versions! The only thing that is unclear to me is the difference between nlmixr(update(fit,tka=0.5),est="posthoc") and nlmixr(update(fit,tka=fix(0.5)), est="posthoc") (but it is possible that it'll immediately become clear if I solve my previous homework!).

Since you do not use the pipeline operator, %>% , I am not clear if you would need to provide the dataset in the nlmixr function. It may work or it may not.

Well, everything seems to be fine so far (apart from that error but it has nothing to do with the pipeline operator).

When I get time I can try the following code (or you can try if you have more time than I do, since I will be giving courses and attending PAGE this week):

Of course! I tried, but to no avail: unfortunately, it returns the exact same error message, Error in eval(.call[[.n]]) : object 'kainit' not found. Perhaps the remaining single layer is still too much...?

mattfidler commented 5 years ago

Just to make sure I understand you correctly: addCwres is not needed because I want to do anything with CWRES itself but rather because it will change OBJF not calculated fit to OBJF by FOCEi approximation fit and then (and only then!) will I have fit$objf that I can compare between different estimation methods...?

Well, I think CWRES checks are useful, but otherwise your logic is correct.

I have to admit that this "posthoc" story is not entirely clear to me from the theoretical aspect to start with, so I think I first have to find some good textbook or paper on this... I appreciate if you can suggest one, but that's really my task :)

I'm not sure of a text book, but the posthoc is simply one part of the FOCEi problem, that is keeping the population parameters the same while estimating the between subject "ETA" or emperical bayesian estimates for each individual. The "posthoc" is what NONMEM called the step for FO, though posthoc as a term means post-event or post-analysis, I believe.

There isn't a difference between:

fit1 <- nlmixr(update(fit,tka=0.5),est="posthoc")

And

fit2 <- nlmixr(update(fit,tka=fix(0.5)),est="posthoc")

Unless you apply a different method to the fit afterward. That is

fit1 %>% nlmixr(est="focei") # Will allow tka to change from 0.5
fit2 %>% nlmixr(est="focei") # Will keep tka at 0.5

Well, everything seems to be fine so far (apart from that error but it has nothing to do with the pipeline operator).

nlmixr checks if you are in a pipeline and changes the behavior of some function. This is simply checking if the first argument's symbol is actually a . which implies a pipeline.

mattfidler commented 5 years ago

You can try your modified code, it should work if you update nlmixr.

tamas-ferenci commented 5 years ago

Dear @mattfidler ,

Thank you very much again! Everything works perfectly.

Here is my final convergence diagnostics code then (I commented NLME out because it gave error, but I it obviously has nothing to do with the current question):

library( nlmixr )
library( lattice )

one.compartment.saem <- function() {
  ini({
    tka <- .5   # Log Ka
    tcl <- -3.2 # Log Cl
    tv <- -1    # Log V
    eta.ka ~ 1
    eta.cl ~ 2
    eta.v ~ 1
    add.err <- 0.1
  })
  model({
    ka <- exp(tka + eta.ka)
    cl <- exp(tcl + eta.cl)
    v <- exp(tv + eta.v)
    d/dt(depot) = -ka * depot
    d/dt(center) = ka * depot - cl / v * center
    cp = center / v 
    cp ~ add(add.err)
  })
}

fit <- nlmixr( one.compartment.saem, theo_sd, est = "saem" )

fits <- lapply( seq( 0.2, 1, 0.1 ), function( tkainit ) {
  fitSAEM <- addCwres( nlmixr( update( fit, tka = tkainit ), theo_sd, est="saem" ) )
  fitFOCEI <- addCwres( nlmixr( update( fit, tka = tkainit ), theo_sd, est="focei" ) )
  # fitNLME <- addCwres( nlmixr( update( fit, tka = tkainit ), theo_sd, est="nlme" ) )
  list( list( tkainit = tkainit, est = "SAEM", fit = fitSAEM ),
        list( tkainit = tkainit, est = "FOCEI", fit = fitFOCEI ) )
} )

xyplot( value ~ tkainit | Parameter, groups = Method, type = "l",
        scales = list( relation = "free" ),
        auto.key = list( columns = 2, lines = TRUE, points = FALSE ),
        data = do.call( rbind, lapply( fits, function( x )
          do.call( rbind, lapply( x, function( y )
            cbind( tkainit = y$tkainit, Method = y$est,
                   data.frame( Parameter = c( names( y$fit$theta ), "objf" ),
                               value = c( y$fit$theta, y$fit$objf ) ) ) ) ) ) ) )

mattfidler commented 5 years ago

library( nlmixr )
library( lattice )

one.compartment.saem <- function() {
  ini({
    tka <- .5   # Log Ka
    tcl <- -3.2 # Log Cl
    tv <- -1    # Log V
    eta.ka ~ 1
    eta.cl ~ 2
    eta.v ~ 1
    add.err <- 0.1
  })
  model({
    ka <- exp(tka + eta.ka)
    cl <- exp(tcl + eta.cl)
    v <- exp(tv + eta.v)
    d/dt(depot) = -ka * depot
    d/dt(center) = ka * depot - cl / v * center
    cp = center / v 
    cp ~ add(add.err)
  })
}

fit <- nlmixr( one.compartment.saem, theo_sd, est = "saem" )
#> Compiling RxODE equations...done.
#> Building SAEM model...done
#> 1:    -1.4039    0.0430    0.6213    1.2228    5.3013    4.1542   19.4201
#>...

xyplot( value ~ tkainit | Parameter, groups = Method, type = "l",
        scales = list( relation = "free" ),
        auto.key = list( columns = 2, lines = TRUE, points = FALSE ),
        data = do.call( rbind, lapply( fits, function( x )
          do.call( rbind, lapply( x, function( y )
            cbind( tkainit = y$tkainit, Method = y$est,
                   data.frame( Parameter = c( names( y$fit$theta ), "objf" ),
                               value = c( y$fit$theta, y$fit$objf ) ) ) ) ) ) ) )

^{Created on 2019-06-19 by the reprex package (v0.3.0)}

Note: in your code the SAEM objective function and the FOCEi objective function are different and should not be compared

mattfidler commented 5 years ago

You can add table=tableControl(cwres=TRUE) to use the FOCEi approximation instead of the guassian quadrature approximation. In this problem they are practically the same.

library( nlmixr )
library( lattice )

one.compartment.saem <- function() {
  ini({
    tka <- .5   # Log Ka
    tcl <- -3.2 # Log Cl
    tv <- -1    # Log V
    eta.ka ~ 1
    eta.cl ~ 2
    eta.v ~ 1
    add.err <- 0.1
  })
  model({
    ka <- exp(tka + eta.ka)
    cl <- exp(tcl + eta.cl)
    v <- exp(tv + eta.v)
    d/dt(depot) = -ka * depot
    d/dt(center) = ka * depot - cl / v * center
    cp = center / v 
    cp ~ add(add.err)
  })
}

fit <- nlmixr( one.compartment.saem, theo_sd, est = "saem", control=list(print=0))
#> Compiling RxODE equations...done.
#> Building SAEM model...done
#> Calculating covariance matrix
#> [====|====|====|====|====|====|====|====|====|====
#> Load into sympy...Using sympy via reticulate
#> done
#> Freeing Python/SymPy memory...done
#> ################################################################################
#> Optimizing expressions in Predictions/EBE model...done
#> Compiling Predictions/EBE model...done
#> Standardized prediction/ebe models produced.
#> Calculating residuals/tables
#> done.

fits <- lapply( seq( 0.2, 1, 0.1 ), function( tkainit ) {
  fitSAEM <- addCwres( nlmixr( update( fit, tka = tkainit ), theo_sd, est="saem", control=list(print=0)) )
  fitFOCEI <- addCwres( nlmixr( update( fit, tka = tkainit ), theo_sd, est="focei", control=list(print=0),
       table=tableControl(cwres=TRUE)) )
  # fitNLME <- addCwres( nlmixr( update( fit, tka = tkainit ), theo_sd, est="nlme" ) )
  list( list( tkainit = tkainit, est = "SAEM", fit = fitSAEM ),
        list( tkainit = tkainit, est = "FOCEI", fit = fitFOCEI ) )
} )
#> Calculating covariance matrix
#> ...

xyplot( value ~ tkainit | Parameter, groups = Method, type = "l",
        scales = list( relation = "free" ),
        auto.key = list( columns = 2, lines = TRUE, points = FALSE ),
        data = do.call( rbind, lapply( fits, function( x )
          do.call( rbind, lapply( x, function( y )
            cbind( tkainit = y$tkainit, Method = y$est,
                   data.frame( Parameter = c( names( y$fit$theta ), "objf" ),
                               value = c( y$fit$theta, y$fit$objf ) ) ) ) ) ) ) )

^{Created on 2019-06-19 by the reprex package (v0.3.0)}

Note: in this case the objective function from FOCEi approximation is similar to the objective function from SAEM approximation.

tamas-ferenci commented 5 years ago

Thank you @mattfidler ! Just a minor question: if I use table=tableControl(cwres=TRUE) do I still need addCwres...?

If I run

fit <- nlmixr( one.compartment.saem, theo_sd, est = "saem" )
fit2 <- addCwres( fit, updateObject = FALSE )
fit3 <- nlmixr( one.compartment.saem, theo_sd, est = "saem",
            table=tableControl( cwres=TRUE ) )

then the numerical values in fit2 and fit3 are the same, however the former says OBJF by FOCEi approximation fit, the latter says OBJF not calculated fit (yet displays OBJF!), which is somewhat strange...

mattfidler commented 5 years ago

No you do not need addCwres if you has tableControl(cwres=TRUE).

The OBJF not calculated yet is a bug in changing a marker.

tamas-ferenci commented 5 years ago

Thank you very much again @mattfidler !

In that case, here is my final version (I tried to automatize everything as much as possible):

fit <- nlmixr( one.compartment.saem, theo_sd, est = "saem" )

InitialDependence <- function( fit, span = c( -1, 1 ), n = 10,
                               methods = c( "saem", "focei" ) ) {
  apply( merge( stack( lapply( get( fit$modelName )()$theta,
                               function( x ) seq( x+span[1], x+span[2],
                                                  length.out = n ) ) ),
                data.frame( method = methods ) ), 1,
         function( x ) {
           list( param = x[[ "ind" ]], value = as.numeric( x[[ "values" ]] ),
                 est = x[[ "method" ]],
                 fit = nlmixr( do.call( update, setNames( list(
                   fit, as.numeric( x[[ "values" ]] ) ),
                   c( "", x[[ "ind" ]] ) ) ),
                   get( fit$dataName ), est = x[[ "method" ]],
                   table = tableControl( cwres=TRUE ),
                   control = list( print = 0 ) ) )
         } )
}

idep <- InitialDependence( fit )

PlotInitialDependence <- function( initdep, paramin ) {
  xyplot( theta ~ value | name, groups = est, type = "l", main = paramin,
          scales = list( relation = "free" ),
          auto.key = list( columns = 2, lines = TRUE, points = FALSE ),
          data = do.call( rbind, lapply( temp, function( x )
            data.frame( param = x$param, value = x$value, est = x$est,
                        name = c( names( x$fit$theta ), "objf" ),
                        theta = c( x$fit$theta, x$fit$objf ) ) ) ),
          subset = param==paramin )
}

PlotInitialDependence( idep, "tv" )

for( p in names( fit$theta ) ) print( PlotInitialDependence( idep, p ) )

tamas-ferenci commented 5 years ago

One - possibly last - update: it worth restarting the model estimation before we begin this whole investigation to check that the estimates have indeed converged at all (see this); and I also added an option to plot the relative differences compared to the estimates of the original fit, so that one can easily see how large the deviation is:

library( nlmixr )
library( lattice )

one.compartment.saem <- function() {
  ini({
    tka <- .5   # Log Ka
    tcl <- -3.2 # Log Cl
    tv <- -1    # Log V
    eta.ka ~ 1
    eta.cl ~ 2
    eta.v ~ 1
    add.err <- 0.1
  })
  model({
    ka <- exp(tka + eta.ka)
    cl <- exp(tcl + eta.cl)
    v <- exp(tv + eta.v)
    d/dt(depot) = -ka * depot
    d/dt(center) = ka * depot - cl / v * center
    cp = center / v 
    cp ~ add(add.err)
  })
}

fit <- nlmixr( one.compartment.saem, theo_sd, est = "saem",
               control = list( print = 0 ), table = list( cwres = TRUE ) )
fit <- nlmixr( fit,theo_sd, est = "saem", control = list( print = 0 ),
               table = list( cwres = TRUE ) )

InitialDependence <- function( fit, span = c( -1, 1 ), n = 10,
                               methods = c( "saem", "focei" ) ) {
  apply( merge( stack( lapply( get( fit$modelName )()$theta,
                               function( x ) seq( x+span[1], x+span[2],
                                                  length.out = n ) ) ),
                data.frame( method = methods ) ), 1,
         function( x ) {
           list( param = x[[ "ind" ]], value = as.numeric( x[[ "values" ]] ),
                 est = x[[ "method" ]],
                 fit = nlmixr( do.call( update, setNames( list(
                   fit, as.numeric( x[[ "values" ]] ) ),
                   c( "", x[[ "ind" ]] ) ) ),
                   get( fit$dataName ), est = x[[ "method" ]],
                   table = tableControl( cwres=TRUE ),
                   control = list( print = 0 ) ) )
         } )
}

idep <- InitialDependence( fit, span = c( -2, 2 ), n = 50 )

PlotInitialDependence <- function( initdep, fit, paramin, relative = FALSE,
                                   valuemin = -Inf, valuemax = Inf ) {
  lattice::xyplot( theta ~ value | name, groups = est, type = "l", main = paramin,
                   scales = list( relation = "free" ),
                   auto.key = list( columns = 2, lines = TRUE, points = FALSE ),
                   data = do.call( rbind, lapply( initdep, function( x )
                     data.frame( param = x$param, value = x$value, est = x$est,
                                 name = c( names( x$fit$theta ), "objf" ),
                                 theta = c( if ( relative ) x$fit$theta/fit$theta else
                                   x$fit$theta, x$fit$objf ) ) ) ),
                   subset = param==paramin&value>valuemin&value<valuemax )
}

PlotInitialDependence( idep, fit, "tv", valuemin = -1.5, relative = TRUE )

This way, we can plot such nice figures to diagnose the dependence of the convergence on initial values (with the limitation that we check the effect of only one initial variable changed at a time):

result

mattfidler commented 5 years ago

Thank you. Indeed that is likely a better estimate.