want to use nlmixr on bootstrapped data

JSC67 commented 6 years ago

Below is my session output. I made a function, funBS(), that generates bootstrapped data from a data frame that has ID as its first column. However, when I try to use the bootstrapped data with nlmixr, I get an error and no object is created. Note that much of this is output from what I just pasted in for Issue #28 but here is the full info. The bootstrap part begins with funBS(). Thank-you for your help.


R version 3.4.3 (2017-11-30) -- "Kite-Eating Tree"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> # Preliminaries
> setwd("C:/Users/James/Documents/pharmacometrics/testing")
> library(nlmixr)
Loading required package: nlme
Loading required package: RxODE
> # rxVersion(echo=TRUE)
> rxClean()
[1] TRUE
> rxVersion(echo=TRUE)
  _____         ____  _____  ______
 |  __ \       / __ \|  __ \|  ____| 0.7-0
 | |__) |__  _| |  | | |  | | |__
 |  _  / \ \/ / |  | | |  | |  __|
 | | \ \  >  <| |__| | |__| | |____
 |_|  \_\/_/\_\\____/|_____/|______|
> # Use some pre-loaded data.
> str(theo_sd)  # like Theoph but with ID instead of Subject, also other NONMEM-like
'data.frame':   144 obs. of  7 variables:
 $ ID  : int  1 1 1 1 1 1 1 1 1 1 ...
 $ TIME: num  0 0 0.25 0.57 1.12 2.02 3.82 5.1 7.03 9.05 ...
 $ DV  : num  0 0.74 2.84 6.57 10.5 9.66 8.58 8.36 7.47 6.89 ...
 $ AMT : num  4.02 0 0 0 0 0 0 0 0 0 ...
 $ EVID: int  101 0 0 0 0 0 0 0 0 0 ...
 $ CMT : int  1 2 2 2 2 2 2 2 2 2 ...
 $ WT  : num  79.6 79.6 79.6 79.6 79.6 79.6 79.6 79.6 79.6 79.6 ...
> # 'uif' for Unified Interface Function, 1 compartment, lin for linCmt()
> uif.1cm.lin <- function() {
+   ini({
+     logtcl <- -3.2      # log typical value Cl (L/hr)
+     logtka <-  0.5      # log typical value Ka (1/hr)
+     logtv  <- -1        # log typical value V  (L)
+     
+     # error model
+     add.err <- 0.1
+     
+     # Initial estimates for IIV variances
+     # Labels work for single parameters.
+     eta.cl ~ 2
+     eta.ka ~ 1    ## BSV Ka
+     eta.v  ~ 1
+   })
+   model({
+     cl <- exp(logtcl + eta.cl)
+     ka <- exp(logtka + eta.ka)
+     v  <- exp(logtv  + eta.v)
+     linCmt() ~ add(add.err)
+   })
+ }
> # NLME fittings
> 
> fit.nlme.1cm.lin.theo_sd     <- nlmixr(uif.1cm.lin, theo_sd, est="nlme")

**Iteration 1
LME step: Loglik: -182.2318, nlminb iterations: 1
reStruct  parameters:
      ID1       ID2       ID3 
1.0022617 0.2783009 2.0758984 
PNLS step: RSS =  63.14922 
 fixed effects: -3.211937  0.4479979  -0.7859318  
 iterations: 7 
Convergence crit. (must all become <= tolerance = 1e-05):
   fixed reStruct 
0.272375 3.267309 

**Iteration 2
LME step: Loglik: -179.291, nlminb iterations: 9
reStruct  parameters:
       ID1        ID2        ID3 
0.96385527 0.08333435 1.63552295 
PNLS step: RSS =  63.28221 
 fixed effects: -3.211535  0.4441505  -0.7863391  
 iterations: 7 
Convergence crit. (must all become <= tolerance = 1e-05):
      fixed    reStruct 
0.008662381 0.172135330 

**Iteration 3
LME step: Loglik: -179.3381, nlminb iterations: 8
reStruct  parameters:
       ID1        ID2        ID3 
0.96129928 0.07085834 1.63885629 
PNLS step: RSS =  63.2196 
 fixed effects: -3.211732  0.4458486  -0.7861743  
 iterations: 7 
Convergence crit. (must all become <= tolerance = 1e-05):
      fixed    reStruct 
0.003808629 0.082919683 

**Iteration 4
LME step: Loglik: -179.3201, nlminb iterations: 7
reStruct  parameters:
       ID1        ID2        ID3 
0.96243507 0.07618717 1.63736945 
PNLS step: RSS =  63.22991 
 fixed effects: -3.211823  0.4451494  -0.7862432  
 iterations: 7 
Convergence crit. (must all become <= tolerance = 1e-05):
      fixed    reStruct 
0.001570583 0.013884126 

**Iteration 5
LME step: Loglik: -179.3277, nlminb iterations: 4
reStruct  parameters:
       ID1        ID2        ID3 
0.96197111 0.07396744 1.63796900 
PNLS step: RSS =  63.24112 
 fixed effects: -3.211722  0.4454457  -0.7862145  
 iterations: 6 
Convergence crit. (must all become <= tolerance = 1e-05):
       fixed     reStruct 
0.0006651255 0.0147544693 

**Iteration 6
LME step: Loglik: -179.3246, nlminb iterations: 1
reStruct  parameters:
       ID1        ID2        ID3 
0.96215433 0.07487426 1.63772430 
PNLS step: RSS =  63.24112 
 fixed effects: -3.211722  0.4454457  -0.7862145  
 iterations: 1 
Convergence crit. (must all become <= tolerance = 1e-05):
       fixed     reStruct 
0.000000e+00 3.910435e-10 
Using sympy via SnakeCharmR
## Calculate ETA-based prediction and error derivatives:
Calculate Jacobian...................done.
Calculate sensitivities.......
done.
## Calculate d(f)/d(eta) 
## ...
## done
## ...
## done
C:/RTools/3.4/mingw_64/bin/gcc  -I"C:/R/R-34~1.3/include" -DNDEBUG          -O2 -Wall  -std=gnu99 -mtune=generic -c rx_ebf0e5096f80d0203de27cbc48be0f26_x64.c -o rx_ebf0e5096f80d0203de27cbc48be0f26_x64.o
C:/RTools/3.4/mingw_64/bin/gcc -shared -s -static-libgcc -o rx_ebf0e5096f80d0203de27cbc48be0f26_x64.dll tmp.def rx_ebf0e5096f80d0203de27cbc48be0f26_x64.o -LC:/R/R-34~1.3/bin/x64 -lRblas -LC:/R/R-34~1.3/bin/x64 -lRlapack -lgfortran -lm -lquadmath -LC:/R/R-34~1.3/bin/x64 -lR
C:/RTools/3.4/mingw_64/bin/gcc  -I"C:/R/R-34~1.3/include" -DNDEBUG          -O2 -Wall  -std=gnu99 -mtune=generic -c rx_40eae078bd4f9e0651450c3b310d5e9a_x64.c -o rx_40eae078bd4f9e0651450c3b310d5e9a_x64.o
C:/RTools/3.4/mingw_64/bin/gcc -shared -s -static-libgcc -o rx_40eae078bd4f9e0651450c3b310d5e9a_x64.dll tmp.def rx_40eae078bd4f9e0651450c3b310d5e9a_x64.o -LC:/R/R-34~1.3/bin/x64 -lRblas -LC:/R/R-34~1.3/bin/x64 -lRlapack -lgfortran -lm -lquadmath -LC:/R/R-34~1.3/bin/x64 -lR
The model-based sensitivities have been calculated.
It will be cached for future runs.
Diagonal form: sqrt
     [,1]
[1,]    1
Calculate symbolic inverse...done
Calculate symbolic determinant of inverse...done
Calculate d(Omega)/d(Est) and d(Omega^-1)/d(Est)...
.0
...done.
Calculating Table Variables...
done
Warning message:
In nlmixrUI.nlme.var(obj) :
  Initial condition for additive error ignored with nlme
nlmixr UI combined dataset and properties
 $ par.hist         : Parameter history (if available)
 $ par.hist.stacked : Parameter history in stacked form for easy plotting (if available)
 $ par.fixed        : Fixed Effect Parameter Table
 $ eta              : Individual Parameter Estimates
 $ seed             : Seed (if applicable)
 $ model.name       : Model name (from R function)
 $ data.name        : Name of R data input
> # Only 1 iteration for DFN.
> fit.nlme.1cm.lin.theo_sd.init <- nlmixr(uif.1cm.lin, theo_sd, est="nlme",
+                                         control = nlmeControl(maxIter = 1,
+                                                               pnlsMaxIter = 1,
+                                                               msMaxIter = 1)
+                                         )

**Iteration 1
LME step: Loglik: -182.2318, nlminb iterations: 1
reStruct  parameters:
      ID1       ID2       ID3 
1.0022617 0.2783009 2.0758984 
PNLS step: RSS =  87.52968 
 fixed effects: -3.230002  0.5346704  -0.7523971  
 iterations: 1 
Convergence crit. (must all become <= tolerance = 1e-05):
       fixed     reStruct 
3.290853e-01 4.201928e-06 
Error in nlme.formula(model = DV ~ (nlmixr::nlmeModList("user_fn"))(logtcl,  : 
  maximum number of iterations (maxIter = 1) reached without convergence
In addition: Warning message:
In nlmixrUI.nlme.var(obj) :
  Initial condition for additive error ignored with nlme
> # No object created.
> # So try to expand the tolerance:
> fit.nlme.lin.theo_sd.init <- nlmixr(uif.1cm.lin, theo_sd, est="nlme",
+                                     control = nlmeControl(maxIter = 1,
+                                                           pnlsMaxIter = 1,
+                                                           msMaxIter = 1),
+                                     tolerance = 1 # deliberately a huge tolerance
+                                     )
Error in (function (model, data = sys.frame(sys.parent()), fixed, random = fixed,  : 
  unused argument (tolerance = 1)
In addition: Warning message:
In nlmixrUI.nlme.var(obj) :
  Initial condition for additive error ignored with nlme
> # No object created.
> # Just curious, will nlmixrUI(fun) work? It's been removed from latest documentation.
> nlmixrUI(uif.1cm.lin)
## 1-compartment model with first-order absorption in terms of Cl
## Initialization:
################################################################################
Fixed Effects ($theta):
logtcl logtka  logtv 
  -3.2    0.5   -1.0 

Omega ($omega):
     [,1] [,2] [,3]
[1,]    2    0    0
[2,]    0    1    0
[3,]    0    0    1

## Model:
################################################################################
    cl <- exp(logtcl + eta.cl)
    ka <- exp(logtka + eta.ka)
    v  <- exp(logtv  + eta.v)
> # Seems to still work, not sure why its documentation was removed from nlmixr.pdf.
> 
> # Can nlmixr work on bootstrapped data?
> funBS <- function(df, B) {
+   # df is a data frame with column 1 called ID.
+   # B is the number of bootstrapped data sets.
+   subjects <- sort(as.integer(unique(df$ID)))
+   
+   # Initialize an empty data frame to hold bootstrapped values.
+   BsIteration.and.Subject <- data.frame(BsIteration=numeric(),
+                                         ID=character(),
+                                         stringsAsFactors=FALSE)
+   temp <- setNames(data.frame(matrix(ncol = 2, nrow = 0), stringsAsFactors=FALSE),
+                    c("BsIteration", "ID"))
+   for (b in 1:B){
+     # Create a vector of bootstrapped subject IDs.
+     BSsubjects <- sample(1:length(subjects), length(subjects), replace=TRUE)
+     for (i in BSsubjects){
+       # The next line relies on ID being column 1. Index b gets recycled.
+       temp <- as.data.frame( c(b, df[df$ID==i,][1]), stringsAsFactors=FALSE)
+       names(temp) <- c("BsIteration", "ID")
+       BsIteration.and.Subject <- rbind(BsIteration.and.Subject, temp)
+     }
+   }
+   # Create the rest of the empty vector (to avoid lots of rbind on lots of data).
+   other <- setNames(data.frame(matrix(nrow=nrow(BsIteration.and.Subject),
+                                       ncol=ncol(df[-1])),
+                                stringsAsFactors=FALSE),
+                     names(df[-1]) )
+   # 'BSDS' for bootstrapped data set
+   BSDS <- cbind(BsIteration.and.Subject, other)
+   
+   # Fill in BSDS.
+   for (i in BSDS$ID){
+     BSDS[,c(-1,-2)] <- df[df$ID==i,2:ncol(df)]
+   }
+   
+   return(list(BSsubjects, BSDS))
+ }
> BSDS002 <- funBS(theo_sd, 2)  # 002 suffix is number of BS samples
> BSDS002[[1]]
 [1] 10 12  8 11  1  7  5 10  5  9  2  7
> BSDS002[[2]]$ID
  [1]  2  2  2  2  2  2  2  2  2  2  2  2 11 11 11 11 11 11 11 11 11 11 11 11 10 10 10
 [28] 10 10 10 10 10 10 10 10 10  2  2  2  2  2  2  2  2  2  2  2  2  9  9  9  9  9  9
 [55]  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9  9  6  6  6  6  6  6  6  6  6
 [82]  6  6  6  9  9  9  9  9  9  9  9  9  9  9  9  1  1  1  1  1  1  1  1  1  1  1  1
[109] 10 10 10 10 10 10 10 10 10 10 10 10  4  4  4  4  4  4  4  4  4  4  4  4  7  7  7
[136]  7  7  7  7  7  7  7  7  7 10 10 10 10 10 10 10 10 10 10 10 10 12 12 12 12 12 12
[163] 12 12 12 12 12 12  8  8  8  8  8  8  8  8  8  8  8  8 11 11 11 11 11 11 11 11 11
[190] 11 11 11  1  1  1  1  1  1  1  1  1  1  1  1  7  7  7  7  7  7  7  7  7  7  7  7
[217]  5  5  5  5  5  5  5  5  5  5  5  5 10 10 10 10 10 10 10 10 10 10 10 10  5  5  5
[244]  5  5  5  5  5  5  5  5  5  9  9  9  9  9  9  9  9  9  9  9  9  2  2  2  2  2  2
[271]  2  2  2  2  2  2  7  7  7  7  7  7  7  7  7  7  7  7
> head(BSDS002[[2]]$ID, 15)
 [1]  2  2  2  2  2  2  2  2  2  2  2  2 11 11 11
> head(BSDS002[[2]], 15)
   BsIteration ID  TIME   DV  AMT EVID CMT   WT
1            1  2  0.00 0.00 4.95  101   1 64.6
2            1  2  0.00 0.15 0.00    0   2 64.6
3            1  2  0.25 0.85 0.00    0   2 64.6
4            1  2  0.50 2.35 0.00    0   2 64.6
5            1  2  1.02 5.02 0.00    0   2 64.6
6            1  2  2.02 6.58 0.00    0   2 64.6
7            1  2  3.48 7.09 0.00    0   2 64.6
8            1  2  5.00 6.66 0.00    0   2 64.6
9            1  2  6.98 5.25 0.00    0   2 64.6
10           1  2  9.00 4.39 0.00    0   2 64.6
11           1  2 12.05 3.53 0.00    0   2 64.6
12           1  2 24.22 1.15 0.00    0   2 64.6
13           1 11  0.00 0.00 4.95  101   1 64.6
14           1 11  0.00 0.15 0.00    0   2 64.6
15           1 11  0.25 0.85 0.00    0   2 64.6
> tail(BSDS002[[2]], 15)
    BsIteration ID  TIME   DV  AMT EVID CMT   WT
274           2  2  9.00 4.39 0.00    0   2 64.6
275           2  2 12.05 3.53 0.00    0   2 64.6
276           2  2 24.22 1.15 0.00    0   2 64.6
277           2  7  0.00 0.00 4.95  101   1 64.6
278           2  7  0.00 0.15 0.00    0   2 64.6
279           2  7  0.25 0.85 0.00    0   2 64.6
280           2  7  0.50 2.35 0.00    0   2 64.6
281           2  7  1.02 5.02 0.00    0   2 64.6
282           2  7  2.02 6.58 0.00    0   2 64.6
283           2  7  3.48 7.09 0.00    0   2 64.6
284           2  7  5.00 6.66 0.00    0   2 64.6
285           2  7  6.98 5.25 0.00    0   2 64.6
286           2  7  9.00 4.39 0.00    0   2 64.6
287           2  7 12.05 3.53 0.00    0   2 64.6
288           2  7 24.22 1.15 0.00    0   2 64.6
> # Bootstrapped data set BSDS002 seems reasonable.
> # nlmixr on bootstrapped data unfortunately fails:
> fit.nlme.1cm.lin.BSDS002.full <- nlmixr(uif.1cm.lin, BSDS002, est="nlme")
Error in apply(data, 1, function(x) { : 
  dim(X) must have a positive length
>

mattfidler commented 6 years ago

I think that the data provided must be a data frame it looks like you are providing something else.

JSC67 commented 6 years ago

Ah, an embarrassing simple mistake. Sorry about that. BSDS is a list whereas its second element BSDS[[2]] is a data frame of bootstrapped data. I was able to get a little bit farther when operating on the data frame, but still had a problem. By the way, in the Convergence criteria, I didn't see any discussion in the documentation on reStruct. What are the columns fixed and reStruct, and is there some documentation on them somewhere? Likewise, I'm confused by the reStruct parameters ID1, ID2, ID3, etc. Perhaps my problems is that my function funBS() makes (I hope!) a single large (although the size is just B = 2 for testing) bootstrapped data set, but perhaps it would be better to make just one BS data set and operate on that, one at a time. I'll try that, but can you please shed some light (or give references) for the output from nlmixr that I asked about? Thanks.

> fit.nlme.1cm.lin.BSDS002.full <- nlmixr(uif.1cm.lin, BSDS002[[2]], est="nlme")

**Iteration 1
LME step: Loglik: -56.6946, nlminb iterations: 1
reStruct  parameters:
        ID1         ID2         ID3 
-0.24626384  0.28144752  0.02927114 
PNLS step: RSS =  26.38875 
 fixed effects: -2.320782  -0.3926246  -0.04520173  
 iterations: 7 
Convergence crit. (must all become <= tolerance = 1e-05):
    fixed  reStruct 
21.123049  1.054445 

**Iteration 2
LME step: Loglik: -134.1951, nlminb iterations: 2
reStruct  parameters:
       ID1        ID2        ID3 
-0.5354660  7.1462985 -0.5373497 
Error in nlme.formula(model = DV ~ (nlmixr::nlmeModList("user_fn"))(logtcl,  : 
  step halving factor reduced below minimum in PNLS step
In addition: Warning messages:
1: In nlmixr_fit(uif, data, est, control = control, calc.resid = calc.resid,  :
  Sorting by ID, TIME; Output fit may not be in the same order as input dataset.
2: In nlmixrUI.nlme.var(obj) :
  Initial condition for additive error ignored with nlme
>

mattfidler commented 6 years ago

Hi James,

This output (the ID1 ID2 ID3) are from nlme; I'm not as familiar with the nlme algoritm as other algoritms. However, I think that restrutcured errors weight the individuals based on different likelihood functions. However, I could be mistaken.

The control = nlmeControl(pnlsTol = .01) is the option you need to adjust to help the convergence of nlme. If you increase the pnlsTol it is more likely to succesfully converge. I also observe that increasing the pnlsTol the ETAs are tending to zero and estimates of between subject variaibility to be biased by tending to approach zero.

I also suggest you downgrade to RxODE 0.6 (since it is still more stable) and upgrade to the latest nlmixr since I think it helps convergence in certain types of problems.

nlmixrdevelopment / nlmixr

want to use nlmixr on bootstrapped data #31