Problem with SAEM covariate modeling

roninsightrx commented 1 year ago

hi Matt,

I'm having an issue when running SAEM in nlmixr2, hope you can help me.

I'm run a simple 2-cmt PK model and testing various covariates. The issue is that SAEM is not giving me sensible results, basically ignoring any covariate I try to test and just sampling a dummy parameter (it seems)...

Let's say I have the two models below, model 5 testing CRCL as a covariate, and model 6 testing WT. I'm using linear solved model here, but the same issue occurs with ODE systems.

mod5 <- function() {
  ini({
    tCL     <- log(5)
    tV      <- log(50)
    tQ      <- log(5)
    tV2     <- log(100)
    tCRCL_CL <- 0.5
    eta_CL   ~ 1.0
    eta_V    ~ 1.0
    add_sd <- 0.5
  })
  model({
    CL <- exp(tCL + eta_CL + logCRCL100*tCRCL_CL )
    V  <- exp(tV  + eta_V)
    Q  <- exp(tQ)
    V2 <- exp(tV2)
    linCmt() ~ add(add_sd)
  })
}

mod6 <- function() {
  ini({
    tCL     <- log(5)
    tV      <- log(50)
    tQ      <- log(5)
    tV2     <- log(100)
    tWT_CL  <- 0.5
    eta_CL  ~ 1.0
    eta_V   ~ 1.0
    add_sd <- 0.5
  })
  model({
    CL <- exp(tCL + eta_CL + logWT70*tWT_CL)
    V  <- exp(tV  + eta_V)
    Q  <- exp(tQ)
    V2 <- exp(tV2)
    linCmt() ~ add(add_sd)
  })
}

Note the different covariate estimate. The head of my dataset looks like below. Obviously logCRCL100 and logWT70 are not the same. I can send you the full data via email if you like.

  ID    TIME   DV EVID MDV CMT  AMT RATE   logWT70  logFFM70 logCRCL100 caplogCRCL100       logCR   logCRstd
1  1  0.0000  0.0    1   1   1 2500 1000 0.6411017 0.1107744  0.6042130     0.4054651  0.00000000 -0.1457015
2  1  3.7000 27.9    0   0   1    0    0 0.6411017 0.1107744  0.6042130     0.4054651  0.00000000 -0.1457015
3  1 10.9167  8.6    0   0   1    0    0 0.6411017 0.1107744  0.6042130     0.4054651  0.00000000 -0.1457015
4  1 11.4500  0.0    1   1   1 1500 1000 0.6411017 0.1107744  0.6042130     0.4054651  0.00000000 -0.1457015
5  1 19.6000  0.0    1   1   1 1500 1000 0.6411017 0.1107744  0.6875946     0.4054651 -0.08338161 -0.1457015
6  1 27.2500  0.0    1   1   1 1500 1000 0.6411017 0.1107744  0.6875946     0.4054651 -0.08338161 -0.1457015

When I then sample:

res5 <- nlmixr(
  mod5, 
  data, 
  est = "saem", control = list(nBurn=50, nEm=50, print=25)
)
res6 <- nlmixr(
  mod6, 
  data, 
  est = "saem", control = list(nBurn=50, nEm=50, print=25)
)

The parameter estimates are exactly the same

> res5$parFixed
          Est.     SE %RSE Back-transformed(95%CI) BSV(CV%) Shrink(SD)%
tCL       1.43  0.186 13.1       4.16 (2.89, 5.99)     72.3      3.02%<
tV        2.02 0.0382 1.89       7.57 (7.02, 8.16)     34.1      16.9%<
tQ        4.08 0.0686 1.68       59.2 (51.8, 67.7)                     
tV2      0.577  0.065 11.3       1.78 (1.57, 2.02)                     
tCRCL_CL  4.06 0.0429 1.06       4.06 (3.98, 4.14)                     
add_sd    2.83                                2.83                     
> res6$parFixed
        Est.     SE %RSE Back-transformed(95%CI) BSV(CV%) Shrink(SD)%
tCL     1.43  0.186 13.1       4.16 (2.89, 5.99)     72.3      3.02%<
tV      2.02 0.0382 1.89       7.57 (7.02, 8.16)     34.1      16.9%<
tQ      4.08 0.0686 1.68       59.2 (51.8, 67.7)                     
tV2    0.577  0.065 11.3       1.78 (1.57, 2.02)                     
tWT_CL  4.06 0.0429 1.06       4.06 (3.98, 4.14)                     
add_sd  2.83                                2.83

One interesting thing, and perhaps related, is that the estimates are actually wrongly reported in the table here. CL is correct, but the order of the other estimates are all wrong.

Do you have any idea what is going on here?

mattfidler commented 1 year ago

My guess is there is some issue with either the mu referencing or how nlmixr2 keeps the parameters.

You can try with

mod5 <- nlmixr(mod5, data, "saem", control=saemControl(muRefCov=FALSE))

If it gives reasonable results this is probably the case, and it needs to be fixed.

If the data are not proprietary, I would be OK taking it at my email matt.fidler at novartis.com

mattfidler commented 1 year ago

When I then sample ... The parameter estimates are exactly the same

I'm not sure what you mean by "sampling"; do you mean that you expect the same saem to give different values since it is a mcmc simulation? If so, that is not true, the seed is set to 99 every time you run saem so the results should be reproducible. You could change the seed manually.

If you are resampling the data, then there likely should be some difference.

roninsightrx commented 1 year ago

muRefCov=FALSE
I tried, but it doesn't resolve the issue, still getting the same estimates for two different covariates. I did find one more nugget of information: I was running this on Ubuntu before, but now I tried running this on Windows, and I get the same behavior but now with additional error messages that might point to the problem... In the output object I'm getting this:

Information about run found (res5$runInfo):
  • column 'logCRCL100' has only 'NA' values for id '84' 
• column 'logCRCL100' has only 'NA' values for id '82' 
• column 'logCRCL100' has only 'NA' values for id '81' 
• column 'logCRCL100' has only 'NA' values for id '80' 
• column 'logCRCL100' has only 'NA' values for id '78' 
• column 'logCRCL100' has only 'NA' values for id '77' 
• column 'logCRCL100' has only 'NA' values for id '75' 
... etc for most (but not all) IDs in the dataset.

So perhaps some parsing of the dataset by nlmixr is changing the covariate data? Or perhaps some indexing is wrong, also leading to the wrong order of parameters reported? I'll send you the dataset and script in a minute. Thanks for your time!

Yes, I know about the seed. I meant that I was expecting different results even with the similar seed because I'm testing two different covariate models. With "sampling" I meant that it's just ignoring the covariate altogether and having an unidentifiable parameter there.

mattfidler commented 1 year ago

I tried, but it doesn't resolve the issue, still getting the same estimates for two different covariates. I did find one more nugget of information: I was running this on Ubuntu before, but now I tried running this on Windows, and I get the same behavior but now with additional error messages that might point to the problem... In the output object I'm getting this:

This is a false warning that I fixed

mattfidler commented 1 year ago

The internal datasets seem to be the different:

  ID    TIME  EVID   AMT II   DV CMT logCRCL100 nlmixrRowNums
1  1  0.0000 10101  1000  0   NA   1   0.604213             1
2  1  2.5000 10101 -1000  0   NA   1         NA             0
3  1  3.7000     0    NA  0 27.9   1   0.604213             2
4  1 10.9167     0    NA  0  8.6   1   0.604213             3
5  1 11.4500 10101  1000  0   NA   1   0.604213             4
6  1 12.9500 10101 -1000  0   NA   1         NA             0

  ID    TIME  EVID   AMT II   DV CMT   logWT70 nlmixrRowNums
1  1  0.0000 10101  1000  0   NA   1 0.6411017             1
2  1  2.5000 10101 -1000  0   NA   1        NA             0
3  1  3.7000     0    NA  0 27.9   1 0.6411017             2
4  1 10.9167     0    NA  0  8.6   1 0.6411017             3
5  1 11.4500 10101  1000  0   NA   1 0.6411017             4
6  1 12.9500 10101 -1000  0   NA   1        NA             0

roninsightrx commented 1 year ago

I see, that seem right then.

When I put a breakpoint in nlmixr2est just before .configsaem is called, I'm seeing that my covariate is actually listed as a parameter. Is that correct?

roninsightrx commented 1 year ago

There is something wrong with the .model object. When I run an example that is working (based on your phenobarbital example, just adding weight as covariate), I see a covars attribute on .model. I don't see that when I run the above models. So seems something is wrong with ui$saemModelList.

It seems like in .saemFitModel it is removing time-varying covariates:

if (length(timeVaryingCovariates) > 0) {
      # Drop time-varying covariates
      .muRefCovariateDataFrame <- .muRefCovariateDataFrame[!(.muRefCovariateDataFrame$covariate %in% timeVaryingCovariates), ]
    }

But I thought time-varying covariates were now possible in SAEM in nlmixr2?

mattfidler commented 1 year ago

I see, that seem right then.

When I put a breakpoint in nlmixr2est just before .configsaem is called, I'm seeing that my covariate is actually listed as a parameter. Is that correct?

Yes, it is listed as a parameter. Since it is time-varying it will not be a mu-referenced covariate.

mattfidler commented 1 year ago

There is something wrong with the .model object. When I run an example that is working (based on your phenobarbital example, just adding weight as covariate), I see a covars attribute on .model. I don't see that when I run the above models. So seems something is wrong with ui$saemModelList.

It seems like in .saemFitModel it is removing time-varying covariates:
if (length(timeVaryingCovariates) > 0) {
      # Drop time-varying covariates
      .muRefCovariateDataFrame <- .muRefCovariateDataFrame[!(.muRefCovariateDataFrame$covariate %in% timeVaryingCovariates), ]
    }
But I thought time-varying covariates were now possible in SAEM in nlmixr2?

Indeed it is possible. Time-varying covariates are treated as Monolix-type regressor covariates.

The time-varying covariates are not known until the data frame is provided; once it is known that the covariate is time varying it is removed from the mu-referenced parameter list. This means that they are not part of the the eventual phi parameter and will be included in the model.

If you tried with the non time-varying WT, you would see that the covariate is no longer in the model at the same break-point.

mattfidler commented 1 year ago

I see, that seem right then.

When I put a breakpoint in nlmixr2est just before .configsaem is called, I'm seeing that my covariate is actually listed as a parameter. Is that correct?

I was testing the data provided to the underlying saem. I will also explore the data right before the configuration of saem to try to figure out why these are giving the same results.

mattfidler commented 1 year ago

You can see the underlying rxode2() model by typing summary(rxodeModel) So in this break-point I can get the same model,

FYI this is the final model:


Browse[3]> summary(attr(.model$saem_mod, "rx"))
rxode2 2.0.13.9000 model named rx_eef18c8317d114824689f56a90fe38b4 model (✔ ready). 
DLL: /tmp/RtmpI0Kb4r/rxode2/rx_eef18c8317d114824689f56a90fe38b4__.rxd/rx_eef18c8317d114824689f56a90fe38b4_.so
NULL

Calculated Variables:
[1] "rx_pred_"
── rxode2 Model Syntax ──
rxode2({
    param(tCL, tV, tQ, tV2, tCRCL_CL, logCRCL100)
    rx_pred_ = linCmtA(rx__PTR__, t, 0, 2, 1, exp(tCL + logCRCL100 * 
        tCRCL_CL), exp(tV), exp(tQ), exp(tV2), 0, 0, 0, 1, 0, 
        0, 0, 0, 1, 0, 0)
    cmt(rxLinCmt)
    dvid(1)
})

mattfidler commented 1 year ago

In this problem the data are listed as follows:

Browse[3]> head(.cfg$evt)
     ID    TIME  EVID   AMT II CMT LOGCRCL100
[1,]  0  0.0000 10101  1000  0   1   0.604213
[2,]  0  2.5000 10101 -1000  0   1         NA
[3,]  0  3.7000     0    NA  0   1   0.604213
[4,]  0 10.9167     0    NA  0   1   0.604213
[5,]  0 11.4500 10101  1000  0   1   0.604213
[6,]  0 12.9500 10101 -1000  0   1         NA
Browse[3]> head(.cfg$evtM)
  ID    TIME  EVID   AMT II CMT LOGCRCL100
1  0  0.0000 10101  1000  0   1   0.604213
2  0  2.5000 10101 -1000  0   1         NA
3  0  3.7000     0    NA  0   1   0.604213
4  0 10.9167     0    NA  0   1   0.604213
5  0 11.4500 10101  1000  0   1   0.604213
6  0 12.9500 10101 -1000  0   1         NA

This seems to match the data above, which means the data are relayed to this step appropriately.

mattfidler commented 1 year ago

It also correctly (internally) identifiers logCRCL100 as a regressor or inPars:

Browse[3]> attr(.model$saem_mod, "inPars")
[1] "logCRCL100

mattfidler commented 1 year ago

And after looking a bit more, the logWT is actually time-varying too, so all the above apply with a different dataset

mattfidler commented 1 year ago

Since these are all time-varying covariates, the mcmc and saem algorithm likely does better when it isn't on the mu-referencing space.

You can see by transferring to a non-log transformed space and using the same sort of model, the items are different

devtools::load_all("~/src/nlmixr2est")
#> ℹ Loading nlmixr2est
#> Loading required package: nlmixr2data
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> 
#> The following object is masked from 'package:testthat':
#> 
#>     matches
#> 
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> 
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
#library(nlmixr2)

## Data prep
# data <- read.csv(file = "./data/nmdata_20230216_1.csv") %>%
#   mutate(CRCLcap = ifelse(CRCL < 150, CRCL, 150)) %>%
#   mutate(logCR = log(CREAT/1.0)) %>%
#   mutate(logWT70 = log(WEIGHT/70)) %>%
#   mutate(logCRCL100 = log(CRCL/100)) %>%
#   mutate(caplogCRCL100 = log(CRCLcap/100)) %>%
#   mutate(logCRCLf100 = log(CRCLf/100)) %>%
#   mutate(logCRstd = -1.22839 + log10(AGE) * 0.67190 + 6.27017 * exp(-3.10940 * AGE)) %>%
#   filter(EVID != 2) %>%
#   mutate(CMT = 1) %>%
#   select(ID, TIME, DV, EVID, MDV, CMT, AMT, RATE, logWT70, logCRCL100)
# write.csv(data, file = "./data/data_matt.csv", quote=F, row.names=F)

data <- read.csv(file="~/Downloads/data_matt.csv")

data$CRCL100 <- exp(data$logCRCL100)
data$WT70 <- exp(data$logWT70)

mod5 <- function() {
  ini({
    tCL     <- log(5)
    tV      <- log(50)
    tQ      <- log(5)
    tV2     <- log(100)
    tCRCL_CL <- 0.5
    eta_CL   ~ 1.0
    eta_V    ~ 1.0
    add_sd <- 0.5
  })
  model({
    CL <- exp(tCL + eta_CL) * CRCL100*tCRCL_CL
    V  <- exp(tV  + eta_V)
    Q  <- exp(tQ)
    V2 <- exp(tV2)
    linCmt() ~ add(add_sd)
  })
}

mod6 <- function() {
  ini({
    tCL     <- log(5)
    tV      <- log(50)
    tQ      <- log(5)
    tV2     <- log(100)
    tWT_CL  <- 0.5
    eta_CL  ~ 1.0
    eta_V   ~ 1.0
    add_sd <- 0.5
  })
  model({
    CL <- exp(tCL + eta_CL) * WT70*tWT_CL
    V  <- exp(tV  + eta_V)
    Q  <- exp(tQ)
    V2 <- exp(tV2)
    linCmt() ~ add(add_sd)
  })
}

res5 <- nlmixr(
  mod5, 
  data, 
  est = "saem", control = saemControl(nBurn=50, nEm=50, print=25)
)
#> → loading into symengine environment...
#> → pruning branches (`if`/`else`) of saem model...
#> ✔ done
#> → finding duplicate expressions in saem model...
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> ✔ done
#> params:  tCL tV  tQ  tV2 tCRCL_CL    V(eta_CL)   V(eta_V)    add_sd
#> rxode2 2.0.13.9000 using 8 threads (see ?getRxThreads)
#>   no cache: create with `rxCreateCache()`
#> NaN in prediction; Consider: relax atol & rtol; change initials; change seed; change structural model
#>   warning only issued once per problem
#> 001: 1.254220    4.451819    1.649558    4.517308    0.686857    0.950000    1.533261    7.910833    
#> 025: 0.908039    4.137851    1.716767    4.476746    0.987237    0.277390    0.447695    2.785627    
#> 050: 0.989925    4.145326    1.757067    4.206163    0.954865    0.083486    0.162111    2.538178    
#> 075: 0.986776    4.132782    1.786766    4.213730    0.954472    0.071165    0.151535    2.595909    
#> 100: 0.984726    4.125234    1.787465    4.206062    0.955066    0.072126    0.164548    2.619451    
#> Calculating covariance matrix
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> → loading into symengine environment...
#> → pruning branches (`if`/`else`) of saem model...
#> ✔ done
#> → finding duplicate expressions in saem predOnly model 0...
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> → finding duplicate expressions in saem predOnly model 1...
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> → finding duplicate expressions in saem predOnly model 2...
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> ✔ done
#> → Calculating residuals/tables
#> ✔ done
#> → compress origData in nlmixr2 object, save 54024
#> → compress phiM in nlmixr2 object, save 150704
#> → compress parHist in nlmixr2 object, save 3344

# old
res6 <- nlmixr(
  mod6, 
  data, 
  est = "saem", control = saemControl(nBurn=50, nEm=50, print=25) 
)
#> → loading into symengine environment...
#> → pruning branches (`if`/`else`) of saem model...
#> ✔ done
#> → finding duplicate expressions in saem model...
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> ✔ done
#> params:  tCL tV  tQ  tV2 tWT_CL  V(eta_CL)   V(eta_V)    add_sd
#> NaN in prediction; Consider: relax atol & rtol; change initials; change seed; change structural model
#>   warning only issued once per problem
#> 001: 1.191666    4.420420    1.636085    4.531718    0.672616    0.950000    1.607929    8.011398    
#> 025: 0.991669    4.140022    1.685775    4.491628    0.783752    0.354265    0.469498    2.867746    
#> 050: 1.026475    4.095524    1.938268    4.044226    0.829650    0.248095    0.116972    2.814914    
#> 075: 1.032525    4.100067    1.972776    4.074732    0.822266    0.232085    0.105635    2.785015    
#> 100: 1.031503    4.085930    1.970743    4.075502    0.821414    0.246688    0.113126    2.816171    
#> Calculating covariance matrix
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> → loading into symengine environment...
#> → pruning branches (`if`/`else`) of saem model...
#> ✔ done
#> → finding duplicate expressions in saem predOnly model 0...
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> → finding duplicate expressions in saem predOnly model 1...
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> → finding duplicate expressions in saem predOnly model 2...
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> ✔ done
#> → Calculating residuals/tables
#> ✔ done
#> → compress origData in nlmixr2 object, save 54024
#> → compress phiM in nlmixr2 object, save 142312
#> → compress parHist in nlmixr2 object, save 3312

res5$parFixed
#>           Est.       SE     %RSE    Back-transformed(95%CI) BSV(CV%)
#> tCL      0.985 2.99e+03 3.04e+05              2.68 (0, inf)     27.3
#> tV        4.13   0.0758     1.84          61.9 (53.3, 71.8)     42.3
#> tQ        1.79    0.103     5.74          5.97 (4.89, 7.31)         
#> tV2       4.21   0.0897     2.13            67.1 (56.3, 80)         
#> tCRCL_CL 0.955 2.86e+03 2.99e+05 0.955 (-5.6e+03, 5.61e+03)         
#> add_sd    2.62                                         2.62         
#>          Shrink(SD)%
#> tCL           10.1%<
#> tV            10.3%<
#> tQ                  
#> tV2                 
#> tCRCL_CL            
#> add_sd

res6$parFixed
#>         Est.       SE     %RSE     Back-transformed(95%CI) BSV(CV%) Shrink(SD)%
#> tCL     1.03 1.35e+03 1.31e+05               2.81 (0, inf)     52.9      6.52%<
#> tV      4.09    0.096     2.35           59.5 (49.3, 71.8)     34.6      18.0%<
#> tQ      1.97    0.147     7.46           7.18 (5.38, 9.57)                     
#> tV2     4.08    0.109     2.67           58.9 (47.6, 72.9)                     
#> tWT_CL 0.821 1.11e+03 1.35e+05 0.821 (-2.17e+03, 2.18e+03)                     
#> add_sd  2.82                                          2.82

^{Created on 2023-05-04 with reprex v2.0.2}

mattfidler commented 1 year ago

You can also look at the parameter distributions of logwt and logcrcl and see that they are quite similar and with a lazy random walk like saem it is not impossible to get the same parameters with very similar but not identical distributions.

roninsightrx commented 1 year ago

You can also look at the parameter distributions of logwt and logcrcl and see that they are quite similar and with a lazy random walk like saem it is not impossible to get the same parameters with very similar but not identical distributions.

Yes, I had considered that (CRCL is calculated partially based on WT), but there is definitely some additional unexplained variability between the two covariates. Also the standard deviation for CRCL is almost 3x that of WT (after dividing by the mean and logging), so it would be surprising to me that SAEM would estimate the effects to be exactly the same. Also when destroying the correlation (data = data %>% mutate(logWT70 = rev(logWT70))), the parameter estimates are still exactly the same, leading me to believe that the covariate is just ignored when mu-modeled.

But thanks for suggesting to run with the time-varying covariates non-MU modeled, I will try that tomorrow for various covariates and report back.

mattfidler commented 1 year ago

the parameter estimates are still exactly the same, leading me to believe that the covariate is just ignored when mu-modeled.

As far as I can tell, this is not true; if it was true, then using it outside of the exponential should also give the same values. Both approaches are not mu-modeled since they are both time-varying covariates. If they were mu-modeled the underlying rxode2 model would be:

rxode2({
    param(tCL, tV, tQ, tV2, tCRCL_CL, logCRCL100)
    rx_pred_ = linCmtA(rx__PTR__, t, 0, 2, 1, exp(tCL),
        exp(tV), exp(tQ), exp(tV2), 0, 0, 0, 1, 0, 
        0, 0, 0, 1, 0, 0)
    cmt(rxLinCmt)
    dvid(1)
})

Then tCL and other phi-type values would then calculate the ETAs and covariate effects by linear regression, which is more robust and accurate than a mcmc/em optimization.

mattfidler commented 1 year ago

My guess is the acceptance of the covariate parameter is the same between the two covariance during the mcmc/em walk, but increasing the variance will give different acceptences of the covariate estimates.

roninsightrx commented 1 year ago

Thanks Matt! When putting the covariate effect outside of the exponential, I'm getting very credible results. I think that solved it.

Should perhaps an error be thrown for future users that attempt to put time-varying covariates within the exponential in SAEM? Or do you think there could be cases where that will work fine and is still preferred? Even if so, a warning would still be in order probably? If not looking closely at the output a user might easily accept erroneous results.

The above also solved the issue with the parameter estimates getting shuffled in the output. Do you want me to make a separate ticket for that?

mattfidler commented 1 year ago

Should perhaps an error be thrown for future users that attempt to put time-varying covariates within the exponential in SAEM? Or do you think there could be cases where that will work fine and is still preferred? Even if so, a warning would still be in order probably? If not looking closely at the output a user might easily accept erroneous results.

I think there may be use cases, but you have a valid point here; perhaps a warning would be helpful (at the very least). It would probably be applicable for any log-transformed parameters.

The above also solved the issue with the parameter estimates getting shuffled in the output. Do you want me to make a separate ticket for that?

This may have been because you were using an old version of nlmixr2. I didn't actually have this issue when I tried your code. If you can reproduce it with a recent version, then please. Otherwise I am assuming it is fixed in the recent version.

mattfidler commented 1 year ago

Another option is to do some nealder-meade optimization for these parameters instead of only using the mcmc algorithm. While it is not described in the methodology section of monolix, it could also work a bit better.

mattfidler commented 1 year ago

The warning now is shown below

devtools::load_all("~/src/nlmixr2est")
#> ℹ Loading nlmixr2est
#> Loading required package: nlmixr2data
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> 
#> The following object is masked from 'package:testthat':
#> 
#>     matches
#> 
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> 
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(ggplot2)
#library(nlmixr2)

## Data prep
# data <- read.csv(file = "./data/nmdata_20230216_1.csv") %>%
#   mutate(CRCLcap = ifelse(CRCL < 150, CRCL, 150)) %>%
#   mutate(logCR = log(CREAT/1.0)) %>%
#   mutate(logWT70 = log(WEIGHT/70)) %>%
#   mutate(logCRCL100 = log(CRCL/100)) %>%
#   mutate(caplogCRCL100 = log(CRCLcap/100)) %>%
#   mutate(logCRCLf100 = log(CRCLf/100)) %>%
#   mutate(logCRstd = -1.22839 + log10(AGE) * 0.67190 + 6.27017 * exp(-3.10940 * AGE)) %>%
#   filter(EVID != 2) %>%
#   mutate(CMT = 1) %>%
#   select(ID, TIME, DV, EVID, MDV, CMT, AMT, RATE, logWT70, logCRCL100)
# write.csv(data, file = "./data/data_matt.csv", quote=F, row.names=F)

data <- read.csv(file="~/Downloads/data_matt.csv")

mod5 <- function() {
  ini({
    tCL     <- log(5)
    tV      <- log(50)
    tQ      <- log(5)
    tV2     <- log(100)
    tCRCL_CL <- 0.5
    eta_CL   ~ 1.0
    eta_V    ~ 1.0
    add_sd <- 0.5
  })
  model({
    CL <- exp(tCL + eta_CL + logCRCL100*tCRCL_CL)
    V  <- exp(tV  + eta_V)
    Q  <- exp(tQ)
    V2 <- exp(tV2)
    linCmt() ~ add(add_sd)
  })
}

mod6 <- function() {
  ini({
    tCL     <- log(5)
    tV      <- log(50)
    tQ      <- log(5)
    tV2     <- log(100)
    tWT_CL  <- 0.5
    eta_CL  ~ 1.0
    eta_V   ~ 1.0
    add_sd <- 0.5
  })
  model({
    CL <- exp(tCL + eta_CL + logWT70*tWT_CL)
    V  <- exp(tV  + eta_V)
    Q  <- exp(tQ)
    V2 <- exp(tV2)
    linCmt() ~ add(add_sd)
  })
}

res5 <- nlmixr(
  mod5, 
  data, 
  est = "saem", control = saemControl(nBurn=50, nEm=50, print=25)
)
#> → loading into symengine environment...
#> → pruning branches (`if`/`else`) of saem model...
#> ✔ done
#> → finding duplicate expressions in saem model...
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> ✔ done
#> params:  tCL tV  tQ  tV2 tCRCL_CL    V(eta_CL)   V(eta_V)    add_sd
#> rxode2 2.0.13.9000 using 8 threads (see ?getRxThreads)
#>   no cache: create with `rxCreateCache()`
#> 001: 1.540419    4.012219    1.614770    4.494998    0.420887    0.950000    0.950000    6.415708    
#> 025: 1.386858    4.137677    1.763550    4.210289    0.626610    0.444569    0.277390    2.728454    
#> 050: 1.418467    4.059830    1.998436    3.999648    0.606273    0.381839    0.105896    2.815682    
#> 075: 1.422673    4.067952    2.025585    4.077568    0.570257    0.400934    0.102184    2.818364    
#> 100: 1.425228    4.059580    2.023862    4.081245    0.576987    0.420640    0.109775    2.825762    
#> Calculating covariance matrix
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00 
#> Error : Error calculating covariance via linearization
#> → loading into symengine environment...
#> → pruning branches (`if`/`else`) of saem model...
#> ✔ done
#> → finding duplicate expressions in saem predOnly model 0...
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> → finding duplicate expressions in saem predOnly model 1...
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> → finding duplicate expressions in saem predOnly model 2...
#> [====|====|====|====|====|====|====|====|====|====] 0:00:00
#> ✔ done
#> → Calculating residuals/tables
#> ✔ done
#> → compress origData in nlmixr2 object, save 45216
#> → compress phiM in nlmixr2 object, save 135560
#> → compress parHist in nlmixr2 object, save 3408

res5$parFixed
#>           Est.     SE %RSE Back-transformed(95%CI) BSV(CV%) Shrink(SD)%
#> tCL       1.43  0.186 13.1       4.16 (2.89, 5.99)     72.3      3.02%<
#> tV        2.02 0.0382 1.89       7.57 (7.02, 8.16)     34.1      16.9%<
#> tQ        4.08 0.0686 1.68       59.2 (51.8, 67.7)                     
#> tV2      0.577  0.065 11.3       1.78 (1.57, 2.02)                     
#> tCRCL_CL  4.06 0.0429 1.06       4.06 (3.98, 4.14)                     
#> add_sd    2.83                                2.83

res5$runInfo
#> [1] "IDs without observations dropped: 97 89 86 85 84 83 76 67 62 59 58 35 30 27 23 3"                                                                                                                 
#> [2] "covariance matrix non-positive definite, corrected by sqrtm(fim %*% fim)"                                                                                                                         
#> [3] "NAs introduced by coercion"                                                                                                                                                                       
#> [4] "log-scale mu referenced time varying covariates (tCRCL_CL) may have better results on no log-transformed scale (https://github.com/nlmixr2/nlmixr2est/issues/348), check results for plausibility"

^{Created on 2023-05-06 with reprex v2.0.2}

Session info

``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.0 (2023-04-21) #> os Pop!_OS 22.04 LTS #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/Chicago #> date 2023-05-06 #> pandoc 2.9.2.1 @ /usr/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> ! package * version date (UTC) lib source #> backports 1.4.1 2021-12-13 [3] RSPM (R 4.2.0) #> brio 1.1.3 2021-11-30 [3] RSPM (R 4.2.0) #> cachem 1.0.8 2023-05-01 [3] RSPM (R 4.2.0) #> callr 3.7.3 2022-11-02 [3] RSPM (R 4.2.0) #> checkmate 2.2.0 2023-04-27 [3] RSPM (R 4.2.0) #> cli 3.6.1 2023-03-23 [3] RSPM (R 4.2.0) #> colorspace 2.1-0 2023-01-23 [3] RSPM (R 4.2.0) #> crayon 1.5.2 2022-09-29 [3] RSPM (R 4.2.0) #> data.table 1.14.8 2023-02-17 [3] RSPM (R 4.2.0) #> desc 1.4.2 2022-09-08 [3] RSPM (R 4.2.0) #> devtools 2.4.5 2022-10-11 [3] RSPM (R 4.2.0) #> digest 0.6.31 2022-12-11 [3] RSPM (R 4.2.0) #> dparser 1.3.1-10 2023-03-16 [3] RSPM (R 4.2.0) #> dplyr * 1.1.2 2023-04-20 [3] RSPM (R 4.2.0) #> ellipsis 0.3.2 2021-04-29 [3] RSPM (R 4.2.0) #> evaluate 0.20 2023-01-17 [3] RSPM (R 4.2.0) #> fansi 1.0.4 2023-01-22 [3] RSPM (R 4.2.0) #> fastmap 1.1.1 2023-02-24 [3] RSPM (R 4.2.0) #> fs 1.6.2 2023-04-25 [3] RSPM (R 4.2.0) #> generics 0.1.3 2022-07-05 [3] RSPM (R 4.2.0) #> ggplot2 * 3.4.2 2023-04-03 [3] RSPM (R 4.2.0) #> glue 1.6.2 2022-02-24 [3] RSPM (R 4.2.0) #> gtable 0.3.3 2023-03-21 [3] RSPM (R 4.2.0) #> htmltools 0.5.5 2023-03-23 [3] RSPM (R 4.2.0) #> htmlwidgets 1.6.2 2023-03-17 [3] RSPM (R 4.2.0) #> httpuv 1.6.9 2023-02-14 [3] RSPM (R 4.2.0) #> knitr 1.42 2023-01-25 [3] RSPM (R 4.2.0) #> later 1.3.1 2023-05-02 [3] RSPM (R 4.2.0) #> lattice 0.21-8 2023-04-05 [3] RSPM (R 4.2.0) #> lazyeval 0.2.2 2019-03-15 [3] RSPM (R 4.2.0) #> lbfgsb3c 2020-3.2 2020-03-03 [3] RSPM (R 4.2.0) #> lifecycle 1.0.3 2022-10-07 [3] RSPM (R 4.2.0) #> lotri 0.4.3 2023-03-20 [3] RSPM (R 4.2.0) #> magrittr 2.0.3 2022-03-30 [3] RSPM (R 4.2.0) #> Matrix 1.5-4 2023-04-04 [3] RSPM (R 4.2.0) #> memoise 2.0.1 2021-11-26 [3] RSPM (R 4.2.0) #> mime 0.12 2021-09-28 [3] RSPM (R 4.2.0) #> miniUI 0.1.1.1 2018-05-18 [3] RSPM (R 4.2.0) #> minqa 1.2.5 2022-10-19 [3] RSPM (R 4.2.0) #> munsell 0.5.0 2018-06-12 [3] RSPM (R 4.2.0) #> n1qn1 6.0.1-11 2022-10-18 [3] RSPM (R 4.2.0) #> nlme 3.1-162 2023-01-31 [3] RSPM (R 4.2.0) #> nlmixr2data * 2.0.7 2022-04-22 [3] RSPM (R 4.2.0) #> P nlmixr2est * 2.1.5 2023-05-06 [?] load_all() #> numDeriv 2016.8-1.1 2019-06-06 [3] RSPM (R 4.2.0) #> optextras 2019-12.4 2019-12-20 [3] RSPM (R 4.2.0) #> pillar 1.9.0 2023-03-22 [3] RSPM (R 4.2.0) #> pkgbuild 1.4.0 2022-11-27 [3] RSPM (R 4.2.0) #> pkgconfig 2.0.3 2019-09-22 [3] RSPM (R 4.2.0) #> pkgload 1.3.2 2022-11-16 [3] RSPM (R 4.2.0) #> PreciseSums 0.6 2023-04-22 [3] RSPM (R 4.2.0) #> prettyunits 1.1.1 2020-01-24 [3] RSPM (R 4.2.0) #> processx 3.8.1 2023-04-18 [3] RSPM (R 4.2.0) #> profvis 0.3.8 2023-05-02 [3] RSPM (R 4.2.0) #> promises 1.2.0.1 2021-02-11 [3] RSPM (R 4.2.0) #> ps 1.7.5 2023-04-18 [3] RSPM (R 4.2.0) #> purrr 1.0.1 2023-01-10 [3] RSPM (R 4.2.0) #> qs 0.25.5 2023-02-22 [3] RSPM (R 4.2.0) #> R.cache 0.16.0 2022-07-21 [3] RSPM (R 4.2.0) #> R.methodsS3 1.8.2 2022-06-13 [3] RSPM (R 4.2.0) #> R.oo 1.25.0 2022-06-12 [3] RSPM (R 4.2.0) #> R.utils 2.12.2 2022-11-11 [3] RSPM (R 4.2.0) #> R6 2.5.1 2021-08-19 [3] RSPM (R 4.2.0) #> RApiSerialize 0.1.2 2022-08-25 [3] RSPM (R 4.2.0) #> Rcpp 1.0.10 2023-01-22 [3] RSPM (R 4.2.0) #> RcppParallel 5.1.7 2023-02-27 [3] RSPM (R 4.2.0) #> remotes 2.4.2 2021-11-30 [3] RSPM (R 4.2.0) #> reprex 2.0.2 2022-08-17 [3] RSPM (R 4.2.0) #> rex 1.2.1 2021-11-26 [3] RSPM (R 4.2.0) #> rlang 1.1.1 2023-04-28 [3] RSPM (R 4.2.0) #> rmarkdown 2.21 2023-03-26 [3] RSPM (R 4.2.0) #> rprojroot 2.0.3 2022-04-02 [3] RSPM (R 4.2.0) #> rstudioapi 0.14 2022-08-22 [3] RSPM (R 4.2.0) #> Rvmmin 2018-4.17.1 2021-04-18 [3] RSPM (R 4.2.0) #> rxode2 * 2.0.13.9000 2023-05-04 [1] local #> rxode2et 2.0.10 2023-03-17 [3] RSPM (R 4.2.0) #> rxode2ll 2.0.11 2023-03-17 [3] RSPM (R 4.2.0) #> rxode2parse 2.0.16.9000 2023-05-04 [1] local #> rxode2random 2.0.11.9000 2023-05-04 [1] local #> scales 1.2.1 2022-08-20 [3] RSPM (R 4.2.0) #> sessioninfo 1.2.2 2021-12-06 [3] RSPM (R 4.2.0) #> shiny 1.7.4 2022-12-15 [3] RSPM (R 4.2.0) #> stringfish 0.15.7 2022-04-13 [3] RSPM (R 4.2.0) #> stringi 1.7.12 2023-01-11 [3] RSPM (R 4.2.0) #> stringr 1.5.0 2022-12-02 [3] RSPM (R 4.2.0) #> styler 1.9.1 2023-03-04 [3] RSPM (R 4.2.0) #> symengine 0.2.2 2022-10-23 [3] RSPM (R 4.2.0) #> sys 3.4.1 2022-10-18 [3] RSPM (R 4.2.0) #> testthat * 3.1.8 2023-05-04 [3] RSPM (R 4.2.0) #> tibble 3.2.1 2023-03-20 [3] RSPM (R 4.2.0) #> tidyselect 1.2.0 2022-10-10 [3] RSPM (R 4.2.0) #> ucminf 1.1-4.1 2022-09-29 [3] RSPM (R 4.2.0) #> units 0.8-2 2023-04-27 [3] RSPM (R 4.2.0) #> urlchecker 1.0.1 2021-11-30 [3] RSPM (R 4.2.0) #> usethis 2.1.6 2022-05-25 [3] RSPM (R 4.2.0) #> utf8 1.2.3 2023-01-31 [3] RSPM (R 4.2.0) #> vctrs 0.6.2 2023-04-19 [3] RSPM (R 4.2.0) #> withr 2.5.0 2022-03-03 [3] RSPM (R 4.2.0) #> xfun 0.39 2023-04-20 [3] RSPM (R 4.2.0) #> xtable 1.8-4 2019-04-21 [3] RSPM (R 4.2.0) #> yaml 2.3.7 2023-01-23 [3] RSPM (R 4.2.0) #> #> [1] /home/matt/R/x86_64-pc-linux-gnu-library/4.3 #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library #> #> P ── Loaded and on-disk path mismatch. #> #> ────────────────────────────────────────────────────────────────────────────── ```

mattfidler commented 3 months ago

This may have been fixed with b8c7292f9f3310bfcada5e331a008820291e0d6c

nlmixr2 / nlmixr2est

Problem with SAEM covariate modeling #348