Closed billdenney closed 5 years ago
Hi @billdenney,
Thanks for the test. I was thinking that the increased pnlsTol
to 0.01
may help (which decreases the model precision on ETA
s, at least in my experience). I think the issue we see in test 68 is a platform difference with how it compares to pnlsTol
.
I would also like to eventually do tthe same suite of models in SAEM
, FO
, FO
, FOCE
, FOCEi
and using the general UI
. FOCEi
is very close to being what I consider final and at that point I think I would be in a good place to expand these tests.
There is another option to try to fix the platform differences using PreciseSums
which reduces platform-based floating point rounding errors in multiplication and addition. For me in some non-testthat tests it makes linux and windows results for FOCEi
identical but adds computational time. nlme
seems to still suffer from rounding errors using this approach.
Model 68 doesn't converge with pnlsTol=0.01
(it does converge in Linux with 0.8 and does not converge in Linux with 0.7). The below test works for test-model68.R
on Docker, but please note the "This parameter is notably different between platforms" comment. I've modified all the numbers to work in Docker, but they all differ from the Windows version. Can you please run it on Windows to see how much closer we can get to each other?
library(testthat)
library(nlmixr)
rxPermissive({
context("NLME68: two-compartment oral Michaelis-Menten, single-dose")
test_that("ODE", {
datr <-
read.csv("Oral_2CPTMM.csv",
header = TRUE,
stringsAsFactors = F)
datr$EVID <- ifelse(datr$EVID == 1, 101, datr$EVID)
datr <- datr[datr$EVID != 2,]
ode2MMKA <- "
d/dt(abs) =-KA*abs;
d/dt(centr) = KA*abs+K21*periph-K12*centr-exp(log(VM)+log(centr)-log(V)-log(KM+centr/V));
d/dt(periph) =-K21*periph+K12*centr;
"
mypar9 <- function(lVM, lKM, lV, lCLD, lVT, lKA)
{
VM <- exp(lVM)
KM <- exp(lKM)
V <- exp(lV)
CLD <- exp(lCLD)
VT <- exp(lVT)
KA <- exp(lKA)
K12 <- CLD / V
K21 <- CLD / VT
}
specs9 <-
list(
fixed = lVM + lKM + lV + lCLD + lVT + lKA ~ 1,
random = pdDiag(lVM + lKM + lV + lCLD + lVT + lKA ~ 1),
start = c(
lVM = 7.3,
lKM = 6.1,
lV = 4.3,
lCLD = 1.4,
lVT = 3.75,
lKA = -0.01
)
)
runno <- "N068"
dat <- datr[datr$SD == 1,]
fit <-
nlme_ode(
dat,
model = ode2MMKA,
par_model = specs9,
par_trans = mypar9,
response = "centr",
response.scaler = "V",
verbose = TRUE,
weight = varPower(fixed = c(1)),
control = nlmeControl(pnlsTol = .08, msVerbose = TRUE, msMaxIter=1000)
)
z <- VarCorr(fit)
expect_equal(signif(as.numeric(fit$logLik), 6), -11755.9)
expect_equal(signif(AIC(fit), 6), 23537.9)
expect_equal(signif(BIC(fit), 6), 23612.4)
expect_equal(signif(as.numeric(fit$coefficients$fixed[1]), 3), 7.33)
expect_equal(signif(as.numeric(fit$coefficients$fixed[2]), 3), 6.1)
expect_equal(signif(as.numeric(fit$coefficients$fixed[3]), 3), 4.24)
expect_equal(signif(as.numeric(fit$coefficients$fixed[4]), 3), 1.39)
expect_equal(signif(as.numeric(fit$coefficients$fixed[5]), 3), 3.75)
expect_equal(signif(as.numeric(fit$coefficients$fixed[6]), 3), -0.0271)
expect_equal(signif(as.numeric(z[1, "StdDev"]), 3), 0.189)
expect_equal(signif(as.numeric(z[2, "StdDev"]), 3), 0.331)
expect_equal(signif(as.numeric(z[3, "StdDev"]), 3), 0.343)
# This parameter is notably different between platforms
#expect_equal(signif(as.numeric(z[4, "StdDev"]), 3), 0.000205) # Windows
expect_true(abs(as.numeric(z[4, "StdDev"])) < 100*sqrt(.Machine$double.eps)) # Linux (Docker)
expect_equal(signif(as.numeric(z[5, "StdDev"]), 3), 0.248)
expect_equal(signif(as.numeric(z[6, "StdDev"]), 3), 0.277)
expect_equal(signif(fit$sigma, 3), 0.205)
})
})
It doesn't converge on windows
**Iteration 6
LME step: Loglik: -11766.46, nlminb iterations: 14
reStruct parameters:
ID1 ID2 ID3 ID4 ID5 ID6
0.003284074 -0.414098590 -0.496051274 6.909741189 -0.163099121 -0.320263754
Beginning PNLS step: .. completed fit_nlme() step.
Error in nlme.formula(model = DV ~ (nlmixr::nlmeModList("user_fn"))(lVM, :
step halving factor reduced below minimum in PNLS step
I'm trying to increase pnlsTol
to 0.9
to see if it runs for me...
@billdenney nlme is funny. I just realized you were decreasing pnlsTol
. You are more likely to get convergence by increasing pnlsTol
@mattfidler, that was my intent. You are more likely to get convergence by increasing pnslTol
, but my guess was that you are more likely to get similar answers cross-platform by decreasing it.
Lemme know how 0.9 works for you-- or perhaps we should accept a few cross-platform differences and have these tests simply be platform-specific. (Not the best, but at least consistent with the expected installation.)
0.9
didn't work either. However pnlsTol=1.0
works.
> expect_equal(signif(as.numeric(fit$logLik), 6), -11755.9)
Error: signif(as.numeric(fit$logLik), 6) not equal to -11755.9.
1/1 mismatches
[1] -11769 - -11756 == -13.2
> expect_equal(signif(AIC(fit), 6), 23537.9)
Error: signif(AIC(fit), 6) not equal to 23537.9.
1/1 mismatches
[1] 23564 - 23538 == 26.3
> expect_equal(signif(BIC(fit), 6), 23612.4)
Error: signif(BIC(fit), 6) not equal to 23612.4.
1/1 mismatches
[1] 23639 - 23612 == 26.3
>
> expect_equal(signif(as.numeric(fit$coefficients$fixed[1]), 3), 7.33)
Error: signif(as.numeric(fit$coefficients$fixed[1]), 3) not equal to 7.33.
1/1 mismatches
[1] 7.32 - 7.33 == -0.01
> expect_equal(signif(as.numeric(fit$coefficients$fixed[2]), 3), 6.1)
Error: signif(as.numeric(fit$coefficients$fixed[2]), 3) not equal to 6.1.
1/1 mismatches
[1] 6.09 - 6.1 == -0.01
> expect_equal(signif(as.numeric(fit$coefficients$fixed[3]), 3), 4.24)
Error: signif(as.numeric(fit$coefficients$fixed[3]), 3) not equal to 4.24.
1/1 mismatches
[1] 4.25 - 4.24 == 0.01
> expect_equal(signif(as.numeric(fit$coefficients$fixed[4]), 3), 1.39)
Error: signif(as.numeric(fit$coefficients$fixed[4]), 3) not equal to 1.39.
1/1 mismatches
[1] 1.38 - 1.39 == -0.01
> expect_equal(signif(as.numeric(fit$coefficients$fixed[5]), 3), 3.75)
> expect_equal(signif(as.numeric(fit$coefficients$fixed[6]), 3), -0.0271)
Error: signif(as.numeric(fit$coefficients$fixed[6]), 3) not equal to -0.0271.
1/1 mismatches
[1] -0.021 - -0.0271 == 0.0061
>
> expect_equal(signif(as.numeric(z[1, "StdDev"]), 3), 0.189)
Error: signif(as.numeric(z[1, "StdDev"]), 3) not equal to 0.189.
1/1 mismatches
[1] 0.19 - 0.189 == 0.001
> expect_equal(signif(as.numeric(z[2, "StdDev"]), 3), 0.331)
Error: signif(as.numeric(z[2, "StdDev"]), 3) not equal to 0.331.
1/1 mismatches
[1] 0.33 - 0.331 == -0.001
> expect_equal(signif(as.numeric(z[3, "StdDev"]), 3), 0.343)
Error: signif(as.numeric(z[3, "StdDev"]), 3) not equal to 0.343.
1/1 mismatches
[1] 0.336 - 0.343 == -0.007
> # This parameter is notably different between platforms
> #expect_equal(signif(as.numeric(z[4, "StdDev"]), 3), 0.000205) # Windows
> expect_true(abs(as.numeric(z[4, "StdDev"])) < 100*sqrt(.Machine$double.eps)) # Linux (Docker)
Error: abs(as.numeric(z[4, "StdDev"])) < 100 * sqrt(.Machine$double.eps) isn't true.
> expect_equal(signif(as.numeric(z[5, "StdDev"]), 3), 0.248)
Error: signif(as.numeric(z[5, "StdDev"]), 3) not equal to 0.248.
1/1 mismatches
[1] 0.253 - 0.248 == 0.005
> expect_equal(signif(as.numeric(z[6, "StdDev"]), 3), 0.277)
Error: signif(as.numeric(z[6, "StdDev"]), 3) not equal to 0.277.
1/1 mismatches
[1] 0.283 - 0.277 == 0.006
>
> expect_equal(signif(fit$sigma, 3), 0.205)
>
I think nlmixr builds fail on Travis because they use ubuntu; My guess is if we fix these tests they will show a successful nlmixr build status.
(Unless I added mac, I forgot if I did)
Your logic about pnlsTol
is correct; nlme is a bit too finicky for that, though.
Alrighty. Given this, I think we may want to just allow these tests to differ cross-platform and try to ensure that all the other tests work cross-platform.
Added platform differences for 68
Would you accept a PR to make all tests model estimation not verbose (setting verbose=FALSE
and msVerbose=FALSE
)? Maybe more generally and a bit harder: A PR that makes verbosity optional (defaulting to FALSE
)?
I'm working through the differences in the tests, and it's more difficult to find which are working and failing with the trace of fitting.
Thats fine.
FYI, I'm trying to change the behavior of some of the tests to use the tol
argument for expect_equal()
to make things a bit more cross-platform. I hope to have a (rather intensive) test PR later this afternoon. It will need Windows testing, and my hope is that we can make it as close as possible to the same tests across all platforms. logLik, AIC, and BIC are the biggest offenders at the moment.
Thank you Bill!
On Fri, Nov 9, 2018 at 12:34 PM Bill Denney notifications@github.com wrote:
FYI, I'm trying to change the behavior of some of the tests to use the tol argument for expect_equal() to make things a bit more cross-platform. I hope to have a (rather intensive) test PR later this afternoon. It will need Windows testing, and my hope is that we can make it as close as possible to the same tests across all platforms. logLik, AIC, and BIC are the biggest offenders at the moment.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nlmixrdevelopment/nlmixr/issues/99#issuecomment-437453109, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfa2m8dKdZI5qjhzBFGb32BmWq9OLMVks5utcqpgaJpZM4YSfXj .
It is taking longer than anticipated. I've run into one issue that I think may need a more general solution:
In test-model54.R:
Error in solve.default(-val) :
Lapack routine dgesv: system is exactly singular: U[4,4] = 0
I'm not sure how to progress with a singular system (I would usually remove a variable, but I don't want to do that with a test). Do you have any ideas?
What about only doing 1 iteration?
Rounding errors in fitting are compounded by multiple steps that it goes through to calculate derivatives. This might be a general way to compare rounding errors and make them <5%
for all platforms.
In test-model54.R
, this means the following fit:
fit <-
nlme_ode(
dat,
model = ode2MM,
par_model = specs7,
par_trans = mypar7,
response = "centr",
response.scaler = "V",
verbose = verbose_minimization,
weight = varPower(fixed = c(1)),
control = nlmeControl(pnlsTol = .1, msVerbose = verbose_minimization,
returnObject=TRUE, msMaxIter=1L, maxIter=1L,
pnlsMaxIter=1L)
)
Really the fitness of nlme
should be determined by R core. nlmixr allows nlme
use with RxODE
. So a test of one iteration could make all the tests faster as well as more platform independent. Of course it would also require testing/updating the windows results.
I still need to get some tests for SAEM
, though there is some tests for FOCEi
comparing objective functions and outputs between nlmixr and NONMEM.
By the way, with the updates windows returns:
> fit
Nonlinear mixed-effects model fit by maximum likelihood
Model: DV ~ (nlmixr::nlmeModList("user_fn"))(lVM, lKM, lV, lCLD, lVT, TIME, ID)
Log-likelihood: -13687.12
Fixed: lVM + lKM + lV + lCLD + lVT ~ 1
lVM lKM lV lCLD lVT
6.979419 5.724384 4.197173 1.461619 4.073274
Random effects:
Formula: list(lVM ~ 1, lKM ~ 1, lV ~ 1, lCLD ~ 1, lVT ~ 1)
Level: ID
Structure: Diagonal
lVM lKM lV lCLD lVT Residual
StdDev: 0.0005680223 0.001065264 0.0008181519 0.0007510751 0.0004518648 0.3046571
Variance function:
Structure: Power of variance covariate
Formula: ~fitted(.)
Parameter estimates:
power
1
Number of Observations: 2280
Number of Groups: 120
>
@mattfidler, I like that idea!
I agree that the accuracy of nlme()
is the purview of R-core, but I'm guessing that users won't necessarily see it that way. It may be worth a note in the manual stating that explicitly and noting something like "Increasing pnlsTol
or changing other nlmeControl()
parameters may decrease the repeatability of results across platforms."
In terms of testing, what do you think if I were to move all the models themselves into helper files (in a location like "tests/testthat/models/") and so that we can more easily collaborate on generating the cross-platform tests? I'm thinking something like:
source("models/N062.R")
source("models/N062_ode.R")
to run the model and then we could each run all the models on Linux and Windows and find any discrepancies.
If so, I'll first generate a PR for separating the models (which will break all the tests). Then, we both run the models auto-generating appropriate testthat testing. Finally, we merge our testing and push a cross-platform testing PR. Sound good?
One note on the testing: Do you know why VarCorr()
returns a character matrix instead of numeric? Is that something that we could update/fix along the way? (I think no because I think it comes from nlme
, but I'd like to check.)
In terms of testing, what do you think if I were to move all the models themselves into helper files (in a location like "tests/testthat/models/") and so that we can more easily collaborate on generating the cross-platform tests?
I think that is OK. I was thinking of adding some UI (aka nlmixr functions) to these as well. For example the model function for model 62 in the UI is:
two.compartment.oral.model <- function(){
ini({ # Where initial conditions/variables are specified
# '<-' or '=' defines population parameters
lCl <- 1.6 #log Cl (L/hr)
lVc <- 4.5 #log Vc (L)
lQ <- 1.5 #log Q (L/hr)
lVp <- 3.9 #log Vp (L)
lKA <- 0.1 #log V (L)
# Bounds may be specified by c(lower, est, upper), like NONMEM:
# Residuals errors are assumed to be population parameters
prop.err <- c(0, 0.2, 1)
# Between subject variability estimates are specified by '~'
# Semicolons are optional
eta.Vc ~ 0.1
eta.Cl ~ 0.1
eta.Vp ~ 0.1
eta.Q ~ 0.1
eta.KA ~ 0.1
})
model({ # Where the model is specified
# The model uses the ini-defined variable names
Vc <- exp(lVc + eta.Vc)
Cl <- exp(lCl + eta.Cl)
Vp <- exp(lVp + eta.Vp)
Q <- exp(lQ + eta.Q)
KA <- exp(lKA + eta.KA)
# RxODE-style differential equations are supported
K10<- Cl/Vc
K12<- Q/Vc
K21<- Q/Vp
d/dt(depot) = -KA*depot;
d/dt(centr) = KA*depot+K21*periph-K12*centr-K10*centr;
d/dt(periph) = -K21*periph+K12*centr;
## Concentration is calculated
cp = centr / Vc;
# And is assumed to follow proportional error estimated by prop.err
cp ~ prop(prop.err)
})
}
For a pharmacometrician familiar with NONMEM and R, these should be similar to things in both languages. There are similar solved systems functions:
two.compartment.oral.model.Solved <- function(){
ini({ # Where initial conditions/variables are specified
# '<-' or '=' defines population parameters
lCl <- 1.6 #log Cl (L/hr)
lVc <- 4.5 #log Vc (L)
lQ <- 1.5 #log Q (L/hr)
lVp <- 3.9 #log Vp (L)
lKA <- 1 #log V (L)
# Bounds may be specified by c(lower, est, upper), like NONMEM:
# Residuals errors are assumed to be population parameters
prop.err <- c(0, 0.2, 1)
# Between subject variability estimates are specified by '~'
# Semicolons are optional
eta.Vc ~ 1
eta.Cl ~ 1
eta.Vp ~ 1
eta.Q ~ 1
eta.KA ~ 1
})
model({ # Where the model is specified
# The model uses the ini-defined variable names
Vc <- exp(lVc + eta.Vc)
Cl <- exp(lCl + eta.Cl)
Vp <- exp(lVp + eta.Vp)
Q <- exp(lQ + eta.Q)
KA <- exp(lKA + eta.KA)
# Solved equations:
linCmt() ~ prop(prop.err)
})
}
These are the functions we are now advocating. The tests were written before this interface between all the methods was written. Once these are completed, the final model is run by any of the supported UI methods:
fit <- nlmixr(two.compartment.oral.model, data, "nlme",nlmeControl(), tableControl())
fit <- nlmixr(two.compartment.oral.model, data, "saem", saemControl(), tableControl())
fit <- nlmixr(two.compartment.oral.model, data, "focei", foceiControl(), tableControl())
fit <- nlmixr(two.compartment.oral.model, data, "foce", foceiControl(), tableControl())
fit <- nlmixr(two.compartment.oral.model, data, "fo", foceiControl(), tableControl())
fit <- nlmixr(two.compartment.oral.model, data, "foi", foceiControl(), tableControl())
Currently these will never make it into CRAN testing because it currently uses python for symbolics.
One note on the testing: Do you know why VarCorr() returns a character matrix instead of numeric? Is that something that we could update/fix along the way? (I think no because I think it comes from nlme, but I'd like to check.)
I think its because of printing.
If you want to fix it you can, though. The default print
of nlme
was too much for me (it output the data as well as the model), so the models from nlmixr
's nlme are the following class:
> class(fit)
[1] "nlmixrNlme" "nlme" "lme"
Since VarCorr
is a method it could be corrected.
Most nlme methods work with nlmixr output objects but may return different things. The VarCorr
method does not work with nlmixr
objects in general, though it could too.
Added VarCorr
as a numeric.
What do you think of this as an option for how to test all the models:
https://github.com/billdenney/nlmixr/tree/simplify-model-tests
Specifically, all the modeling code will be moved to the tests/testthat/models
directory including the expected values. As we move through the models, we may need to also move the tolerances to the models/
directory.
With this method, you lose context()
but the same information is maintained with info=
. We can also add any platform specificity by putting that into the assignment of the expected_values
list rather than having any platform specificity in the testing code itself.
There is a bit of header to each of the modeling .R
files to allow them to run without being in the test environment. That may be excessive, but my hope is that it simplifies things for people trying to muck around later.
You can add context()
simply by calling it during the command the model loops, so I don't think it needs to be removed, but I don't mind if it is removed either.
I think it does simplify things. The header can be simplified a bit further by making a common include, but that is also up to you. I will compare with windows, and perhaps someone else can compare with Mac.
As I'm working my way through the tests, I hit a snag on model 2, when I try to run with a single iteration (your suggested nlmeControl()
parameters above). I think that we're going to have to try to minimize regardless, and I'm going to just let it try hard to minimize and set returnObject=TRUE
to try to overcome pnlsTol
issues.
Ok. The more I use nlme tbe more I prefer other methods.
I tend to find it reasonable for very simple models, and unreasonable for anything else. As there are a lot of tests, I'm enlisting some help in the conversion. (So, I'll probably be a bit quiet for a few days on this-- work is still occurring.)
Progress is being made!
I'm working through the testing, and it turns out that I should be able to test it on both Windows and Linux myself (I don't have access to a current Mac).
I think that we may also be able to fit some of the tests within the 10-minute CRAN limit if we choose them based on execution speed. We could run the fastest 22 model tests in 5 minutes (according to timing on my system). The values below are model execution time in seconds and model number sorted fastest to slowest model with some generous limits for minimization (https://github.com/billdenney/nlmixr/blob/77e78782df2f8fe674999394091327fb66d4c892/tests/testthat/models/helper-prep_fit.R#L8-L15).
N001 N004_ode N002 N003 N004 N012 N013 N025 N023 N026 N015 N024 N049 N033 N035 N047 N001_ode N063 N002_ode N003_ode N062 N014_ode N048 N034 N020_ode N012.ode N010.ode N009_ode N042_ode N022_ode N047_ode N061 N021_ode N023_ode
5 6 6 6 6 8 8 8 8 9 10 10 16 16 17 19 20 21 23 24 26 27 34 34 73 87 99 103 109 115 116 119 122 125
N025_ode N049_ode N013_ode N034_ode N026_ode N011.ode N024_ode N029_ode N068_ode N054_ode N040_ode N030_ode N015_ode N035_ode N060_ode N032_ode N031_ode N032 N033_ode N062_ode N070_ode N014 N055_ode N056_ode N063_ode N069_ode N061_ode N046 N048_ode N060 N046_ode N041_ode
125 142 143 146 155 158 158 161 176 179 183 192 194 205 235 245 263 297 300 402 408 435 542 574 744 761 1747 2200 2716 2736 10013 13408
That is great news. Perhaps for the CRAN only test we can select one. That way we can run the same model in SAEM and FOCEi and nlme with nlmixr's UI (there are 2 different ways to run it in that case).
I can get help or borrow a mac to help with the MAC minimization. I will also get a virtual machine setup for the solaris install. However, I just remembered CRAN doesn't have python so the FOCEi and SAEM will not be tested in CRAN for now.
I got the following surprising error when running the tests; it looked like knitr was trying to run for some reason. Do you know why this may happen? (My R session was using 26 GB of RAM and swapping out constantly, so that will have to be solved as well.)
models//model_N054_ode.R at 2018-11-26 06:23:43
Error in system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE) :
cannot popen '/usr/bin/which 'pdflatex' 2>/dev/null', probable reason 'Cannot allocate memory'
In addition: There were 41 warnings (use warnings() to see them)
26 Nov 2018 22:06:26 [rsession-rstudio] ERROR r error 4 (R code execution error) [errormsg=Error in system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE) :
cannot popen '/usr/bin/which 'pdflatex' 2>/dev/null', probable reason 'Cannot allocate memory'
]; OCCURRED AT: rstudio::core::Error rstudio::r::exec::{anonymous}::evaluateExpressionsUnsafe(SEXP, SEXP, SEXPREC**, rstudio::r::sexp::Protect*, rstudio::r::exec::{anonymous}::EvalType) /home/ubuntu/rstudio/src/cpp/r/RExec.cpp:159; LOGGED FROM: rstudio::core::FilePath rstudio::session::module_context::findProgram(const string&) /home/ubuntu/rstudio/src/cpp/session/SessionModuleContext.cpp:960
26 Nov 2018 22:06:26 [rsession-rstudio] ERROR system error 12 (Cannot allocate memory); OCCURRED AT: rstudio::core::Error rstudio::core::system::ChildProcess::run() /home/ubuntu/rstudio/src/cpp/core/system/PosixChildProcess.cpp:478; LOGGED FROM: bool rstudio::session::modules::svn::isSvnInstalled() /home/ubuntu/rstudio/src/cpp/session/modules/SessionSVN.cpp:444
I'm not sure; Are you running this in paralell?
Perhaps I can run on my system to help.
How do you run the tests? Apparently in Ubuntu rstudio, but using testthat? sourcing a file? I would like to repeat exactly how your are calling the tests.
I just added a commit to RxODE to make sure that memory is freed before calling another solve; This could be the issue, though if so the memory freeing in C isn't working quite right when using nlme.
I'm running the tests with default settings in Ubuntu RStudio, yes. Some but not all of the tests appear to be running in parallel (when I look at top, sometimes 1 core is fully used, other times, all 8 cores are fully used).
At the moment, I'm manually running the tests by running the following line-by-line:
As I'm running models//model_N054_ode.R
after installing the current RxODE and nlmixr GitHub versions just now (merging nlmixr with my current branch), I'm still having what appear to be memory issues. It may be less of a leak than before (I'm not sure), but memory is increasing by about 1 MB/3 seconds when it's using a single processor.
Does single core use mean that the issue is while working in nlme
and not in the parallel parts of nlmixr
/RxODE
?
You could be storing too much information.
Note nlme
stores the original data, the fit data and some of the parameter data, so this is what could be is causing the memory issues.
Try looking at object.size(fit)
to see what that gives you.
Also I looked at the nlme
. By default it is single threaded. However, the nlme
piece can use mclapply
which duplicates the memory to speed up processing.
One last note, due to a CRAN request, we will likely be submitting this again to CRAN by this week/next week to keep it on CRAN. I don't think you need to rush to get this in by then, but don't be surprised if we submit this before the test suite is done. I would have rather waited for this to be completed but when CRAN threatens....
The issue occurs even when I just run model N054, so I don't think that storage is the primary issue. (The fit
object does get moderately large, but in the ~100 MB range (that's what I got in the Windows run), not huge.) Model N054 has been ticking away for about an hour now without making apparent progress. If you have a suggestion for how to look inside to see what may be the issue, I'd like to try that because I don't think that it's going to complete.
And: Don't get delisted from CRAN because you're waiting on this. I think that this will be ready in ~2 weeks (assuming that we can figure out this apparent memory leak quickly), and I'd rather get the tests right than get them on CRAN. If you're ready for a CRAN re-submit and this is the only item, go for CRAN now.
I may be able to get a subset of the tests ready ASAP and we could wait on the rest. For example, I could make all the tests that run in <10 minutes work right now, and assign the others to be skipped for now. Would you like that for the CRAN release (if possible)?
I don't think it is necessary to rush.
I will look into model054 after I get everything CRAN worthy in RxODE/nlmixr. Right now all the nlme tests are skipped on CRAN because of platform differences.
Not only would we have to subset the tests but also make sure they run fast enough and/or skip on the other environments. Its unclear how fast the CRAN environments are; I have seen wildly different numbers.
Currently RxODE tests on Mac/Linux/Windows by Travis and Appveyor, but the Mac builds are always a bit finicky on travis. Mac could also be dependent on how R is installed. I think travis uses the R mavericks build.
Happy New Year Bill;
We resubmitted the RxODE and the current version of RxODE is passing on travis/appveyor again; If you have time, perhaps we can revisit this.
The memory leak may have been corrected by an option I set in RxODE, though I'm unsure at this point.
Happy New Year!
The memory leak was a challenge. I'll try to work on it in the next week or two. I will update to the current RxODE and nlmixr on GitHub master branches. Are there other dependent package changes that I should update?
I will check the memory leak before you begin. I thought the memory leak was in sympy, but reading the thread again it may be elsewhere.
I ran model 54 without any issues. It matched your test results. I also added the UI-style model 54 and ran nlme
, saem
, fo
, foi
, foce
and focei
for model 54 and they also all ran mostly successfully (the FOCE had questionsable sandwich/R
covariance matrix and used the S matrix). The UI ran a bit better in this model than manually specified nlme
, though I can't remeber the differences in how the model is actually run.
When running testthat
, the way you guys have it set up is to test at the end. What is the reason for this? I prefer to look at the results as it is running.
Also @kiranmaiganji since you contributed to the testing, I would also like to add you as a contributor, if that is OK with you. If you have an OCRID, let me know.
Great! I hope to have some time to run tests next week.
When running
testthat
, the way you guys have it set up is to test at the end. What is the reason for this? I prefer to look at the results as it is running.
I'm not sure that I understand the statement. The reason for the setup that is slightly unusual is to help with cross-platform testing. With this setup, we can run the tests and save the results easily so that we can compare results across platforms.
Ok
I meant you use
for (models){
# run models
}
for (models){
# check model results
}
Since you are trying to save results, I changed to
for (models){
# run models
# check new models
}
Hi @billdenney
I have a clarification from your code You said use:
#Generate this with dput(generate_expected_values(fit[[runno]]))
Does that mean go through each example one by one and add the output to each file? or to add the output all at once in a script?
I changed some of the ODE tolerances so that nlme can have better performance.
This is when running one-by-one to add the output to each file. If you run them all, you can run that in a loop to then generate all the expected outputs, but they're not explicitly saved anywhere.
The tests currently don't all succeed on platforms other than Windows. It would be helpful if they ran successfully to very similar results (e.g. with a tolerance <5% and ideally <1%) across platforms. This discussion was started in RichardHooijmaijers/nlmixr.docker#2, but the cross-platform nature applies more to this repository (and, "Hi, @mattfidler" to make sure you see the connection here).
Initial testing succeeded with 196 tests and failed 11 (with an odd error that appears related to testthat and not nlmixr making me not certain that the numbers are everything). Your concerns. are correct; most values are <=3% off.
The only result that looked truly concerning that I saw off hand was from
test-model68.R
:(FYI, this was with my docker image not the stock image. Results could be slightly different in the stock docker image because of slightly different versions of the underlying libraries.)
My initial thought is that increasing the required precision for
pnlsTol
(e.g. to 0.01) may help.