Make Tests Cross-Platform

billdenney commented 6 years ago

The tests currently don't all succeed on platforms other than Windows. It would be helpful if they ran successfully to very similar results (e.g. with a tolerance <5% and ideally <1%) across platforms. This discussion was started in RichardHooijmaijers/nlmixr.docker#2, but the cross-platform nature applies more to this repository (and, "Hi, @mattfidler" to make sure you see the connection here).

Initial testing succeeded with 196 tests and failed 11 (with an odd error that appears related to testthat and not nlmixr making me not certain that the numbers are everything). Your concerns. are correct; most values are <=3% off.

The only result that looked truly concerning that I saw off hand was from test-model68.R:

── 9. Failure: ODE (@test-model68.R#78)  ──────────────────────────────────────────
signif(as.numeric(z[4, "StdDev"]), 3) not equal to 0.000205.
1/1 mismatches
[1] 1.79e-07 - 0.000205 == -0.000205

(FYI, this was with my docker image not the stock image. Results could be slightly different in the stock docker image because of slightly different versions of the underlying libraries.)

My initial thought is that increasing the required precision for pnlsTol (e.g. to 0.01) may help.

mattfidler commented 6 years ago

Hi @billdenney,

Thanks for the test. I was thinking that the increased pnlsTol to 0.01 may help (which decreases the model precision on ETAs, at least in my experience). I think the issue we see in test 68 is a platform difference with how it compares to pnlsTol.

I would also like to eventually do tthe same suite of models in SAEM, FO, FO, FOCE, FOCEi and using the general UI. FOCEi is very close to being what I consider final and at that point I think I would be in a good place to expand these tests.

mattfidler commented 6 years ago

There is another option to try to fix the platform differences using PreciseSums which reduces platform-based floating point rounding errors in multiplication and addition. For me in some non-testthat tests it makes linux and windows results for FOCEi identical but adds computational time. nlme seems to still suffer from rounding errors using this approach.

billdenney commented 6 years ago

Model 68 doesn't converge with pnlsTol=0.01 (it does converge in Linux with 0.8 and does not converge in Linux with 0.7). The below test works for test-model68.R on Docker, but please note the "This parameter is notably different between platforms" comment. I've modified all the numbers to work in Docker, but they all differ from the Windows version. Can you please run it on Windows to see how much closer we can get to each other?

library(testthat)
library(nlmixr)
rxPermissive({
    context("NLME68: two-compartment oral Michaelis-Menten, single-dose")
    test_that("ODE", {

        datr <-
            read.csv("Oral_2CPTMM.csv",
                     header = TRUE,
                     stringsAsFactors = F)
        datr$EVID <- ifelse(datr$EVID == 1, 101, datr$EVID)
        datr <- datr[datr$EVID != 2,]

       ode2MMKA <- "
    d/dt(abs)    =-KA*abs;
    d/dt(centr)  = KA*abs+K21*periph-K12*centr-exp(log(VM)+log(centr)-log(V)-log(KM+centr/V));
    d/dt(periph) =-K21*periph+K12*centr;
    "

        mypar9 <- function(lVM, lKM, lV, lCLD, lVT, lKA)
        {
            VM <- exp(lVM)
            KM <- exp(lKM)
            V <- exp(lV)
            CLD  <- exp(lCLD)
            VT <- exp(lVT)
            KA <- exp(lKA)
            K12 <- CLD / V
            K21 <- CLD / VT
        }
        specs9 <-
            list(
                fixed = lVM + lKM + lV + lCLD + lVT + lKA ~ 1,
                random = pdDiag(lVM + lKM + lV + lCLD + lVT + lKA ~ 1),
                start = c(
                    lVM = 7.3,
                    lKM = 6.1,
                    lV = 4.3,
                    lCLD = 1.4,
                    lVT = 3.75,
                    lKA = -0.01
                )
            )

        runno <- "N068"

        dat <- datr[datr$SD == 1,]

        fit <-
          nlme_ode(
            dat,
            model = ode2MMKA,
            par_model = specs9,
            par_trans = mypar9,
            response = "centr",
            response.scaler = "V",
            verbose = TRUE,
            weight = varPower(fixed = c(1)),
            control = nlmeControl(pnlsTol = .08, msVerbose = TRUE, msMaxIter=1000)
          )

        z <- VarCorr(fit)

        expect_equal(signif(as.numeric(fit$logLik), 6), -11755.9)
        expect_equal(signif(AIC(fit), 6), 23537.9)
        expect_equal(signif(BIC(fit), 6), 23612.4)

        expect_equal(signif(as.numeric(fit$coefficients$fixed[1]), 3), 7.33)
        expect_equal(signif(as.numeric(fit$coefficients$fixed[2]), 3), 6.1)
        expect_equal(signif(as.numeric(fit$coefficients$fixed[3]), 3), 4.24)
        expect_equal(signif(as.numeric(fit$coefficients$fixed[4]), 3), 1.39)
        expect_equal(signif(as.numeric(fit$coefficients$fixed[5]), 3), 3.75)
        expect_equal(signif(as.numeric(fit$coefficients$fixed[6]), 3), -0.0271)

        expect_equal(signif(as.numeric(z[1, "StdDev"]), 3),  0.189)
        expect_equal(signif(as.numeric(z[2, "StdDev"]), 3), 0.331)
        expect_equal(signif(as.numeric(z[3, "StdDev"]), 3), 0.343)
        # This parameter is notably different between platforms
        #expect_equal(signif(as.numeric(z[4, "StdDev"]), 3), 0.000205) # Windows
        expect_true(abs(as.numeric(z[4, "StdDev"])) < 100*sqrt(.Machine$double.eps)) # Linux (Docker)
        expect_equal(signif(as.numeric(z[5, "StdDev"]), 3), 0.248)
        expect_equal(signif(as.numeric(z[6, "StdDev"]), 3), 0.277)

        expect_equal(signif(fit$sigma, 3), 0.205)
    })
})

mattfidler commented 6 years ago

It doesn't converge on windows

**Iteration 6
LME step: Loglik: -11766.46, nlminb iterations: 14
reStruct  parameters:
         ID1          ID2          ID3          ID4          ID5          ID6 
 0.003284074 -0.414098590 -0.496051274  6.909741189 -0.163099121 -0.320263754 
 Beginning PNLS step: ..  completed fit_nlme() step.
Error in nlme.formula(model = DV ~ (nlmixr::nlmeModList("user_fn"))(lVM,  : 
  step halving factor reduced below minimum in PNLS step

I'm trying to increase pnlsTol to 0.9 to see if it runs for me...

mattfidler commented 6 years ago

@billdenney nlme is funny. I just realized you were decreasing pnlsTol. You are more likely to get convergence by increasing pnlsTol

billdenney commented 6 years ago

@mattfidler, that was my intent. You are more likely to get convergence by increasing pnslTol, but my guess was that you are more likely to get similar answers cross-platform by decreasing it.

Lemme know how 0.9 works for you-- or perhaps we should accept a few cross-platform differences and have these tests simply be platform-specific. (Not the best, but at least consistent with the expected installation.)

mattfidler commented 6 years ago

0.9 didn't work either. However pnlsTol=1.0 works.

>         expect_equal(signif(as.numeric(fit$logLik), 6), -11755.9)
Error: signif(as.numeric(fit$logLik), 6) not equal to -11755.9.
1/1 mismatches
[1] -11769 - -11756 == -13.2
>         expect_equal(signif(AIC(fit), 6), 23537.9)
Error: signif(AIC(fit), 6) not equal to 23537.9.
1/1 mismatches
[1] 23564 - 23538 == 26.3
>         expect_equal(signif(BIC(fit), 6), 23612.4)
Error: signif(BIC(fit), 6) not equal to 23612.4.
1/1 mismatches
[1] 23639 - 23612 == 26.3
> 
>         expect_equal(signif(as.numeric(fit$coefficients$fixed[1]), 3), 7.33)
Error: signif(as.numeric(fit$coefficients$fixed[1]), 3) not equal to 7.33.
1/1 mismatches
[1] 7.32 - 7.33 == -0.01
>         expect_equal(signif(as.numeric(fit$coefficients$fixed[2]), 3), 6.1)
Error: signif(as.numeric(fit$coefficients$fixed[2]), 3) not equal to 6.1.
1/1 mismatches
[1] 6.09 - 6.1 == -0.01
>         expect_equal(signif(as.numeric(fit$coefficients$fixed[3]), 3), 4.24)
Error: signif(as.numeric(fit$coefficients$fixed[3]), 3) not equal to 4.24.
1/1 mismatches
[1] 4.25 - 4.24 == 0.01
>         expect_equal(signif(as.numeric(fit$coefficients$fixed[4]), 3), 1.39)
Error: signif(as.numeric(fit$coefficients$fixed[4]), 3) not equal to 1.39.
1/1 mismatches
[1] 1.38 - 1.39 == -0.01
>         expect_equal(signif(as.numeric(fit$coefficients$fixed[5]), 3), 3.75)
>         expect_equal(signif(as.numeric(fit$coefficients$fixed[6]), 3), -0.0271)
Error: signif(as.numeric(fit$coefficients$fixed[6]), 3) not equal to -0.0271.
1/1 mismatches
[1] -0.021 - -0.0271 == 0.0061
> 
>         expect_equal(signif(as.numeric(z[1, "StdDev"]), 3),  0.189)
Error: signif(as.numeric(z[1, "StdDev"]), 3) not equal to 0.189.
1/1 mismatches
[1] 0.19 - 0.189 == 0.001
>         expect_equal(signif(as.numeric(z[2, "StdDev"]), 3), 0.331)
Error: signif(as.numeric(z[2, "StdDev"]), 3) not equal to 0.331.
1/1 mismatches
[1] 0.33 - 0.331 == -0.001
>         expect_equal(signif(as.numeric(z[3, "StdDev"]), 3), 0.343)
Error: signif(as.numeric(z[3, "StdDev"]), 3) not equal to 0.343.
1/1 mismatches
[1] 0.336 - 0.343 == -0.007
>         # This parameter is notably different between platforms
>         #expect_equal(signif(as.numeric(z[4, "StdDev"]), 3), 0.000205) # Windows
>         expect_true(abs(as.numeric(z[4, "StdDev"])) < 100*sqrt(.Machine$double.eps)) # Linux (Docker)
Error: abs(as.numeric(z[4, "StdDev"])) < 100 * sqrt(.Machine$double.eps) isn't true.
>         expect_equal(signif(as.numeric(z[5, "StdDev"]), 3), 0.248)
Error: signif(as.numeric(z[5, "StdDev"]), 3) not equal to 0.248.
1/1 mismatches
[1] 0.253 - 0.248 == 0.005
>         expect_equal(signif(as.numeric(z[6, "StdDev"]), 3), 0.277)
Error: signif(as.numeric(z[6, "StdDev"]), 3) not equal to 0.277.
1/1 mismatches
[1] 0.283 - 0.277 == 0.006
> 
>         expect_equal(signif(fit$sigma, 3), 0.205)
>

mattfidler commented 6 years ago

I think nlmixr builds fail on Travis because they use ubuntu; My guess is if we fix these tests they will show a successful nlmixr build status.

mattfidler commented 6 years ago

(Unless I added mac, I forgot if I did)

mattfidler commented 6 years ago

Your logic about pnlsTol is correct; nlme is a bit too finicky for that, though.

billdenney commented 6 years ago

Alrighty. Given this, I think we may want to just allow these tests to differ cross-platform and try to ensure that all the other tests work cross-platform.

mattfidler commented 6 years ago

Added platform differences for 68

billdenney commented 6 years ago

Would you accept a PR to make all tests model estimation not verbose (setting verbose=FALSE and msVerbose=FALSE)? Maybe more generally and a bit harder: A PR that makes verbosity optional (defaulting to FALSE)?

I'm working through the differences in the tests, and it's more difficult to find which are working and failing with the trace of fitting.

mattfidler commented 6 years ago

Thats fine.

billdenney commented 6 years ago

FYI, I'm trying to change the behavior of some of the tests to use the tol argument for expect_equal() to make things a bit more cross-platform. I hope to have a (rather intensive) test PR later this afternoon. It will need Windows testing, and my hope is that we can make it as close as possible to the same tests across all platforms. logLik, AIC, and BIC are the biggest offenders at the moment.

mattfidler commented 6 years ago

Thank you Bill!

On Fri, Nov 9, 2018 at 12:34 PM Bill Denney notifications@github.com wrote:

FYI, I'm trying to change the behavior of some of the tests to use the tol argument for expect_equal() to make things a bit more cross-platform. I hope to have a (rather intensive) test PR later this afternoon. It will need Windows testing, and my hope is that we can make it as close as possible to the same tests across all platforms. logLik, AIC, and BIC are the biggest offenders at the moment.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/nlmixrdevelopment/nlmixr/issues/99#issuecomment-437453109, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfa2m8dKdZI5qjhzBFGb32BmWq9OLMVks5utcqpgaJpZM4YSfXj .

billdenney commented 6 years ago

It is taking longer than anticipated. I've run into one issue that I think may need a more general solution:

In test-model54.R:

Error in solve.default(-val) : 
  Lapack routine dgesv: system is exactly singular: U[4,4] = 0

I'm not sure how to progress with a singular system (I would usually remove a variable, but I don't want to do that with a test). Do you have any ideas?

mattfidler commented 6 years ago

What about only doing 1 iteration?

Rounding errors in fitting are compounded by multiple steps that it goes through to calculate derivatives. This might be a general way to compare rounding errors and make them <5% for all platforms.

In test-model54.R, this means the following fit:

fit <-
            nlme_ode(
                dat,
                model = ode2MM,
                par_model = specs7,
                par_trans = mypar7,
                response = "centr",
                response.scaler = "V",
                verbose = verbose_minimization,
                weight = varPower(fixed = c(1)),
                control = nlmeControl(pnlsTol = .1, msVerbose = verbose_minimization,
                                      returnObject=TRUE, msMaxIter=1L, maxIter=1L,
                                      pnlsMaxIter=1L)
            )

Really the fitness of nlme should be determined by R core. nlmixr allows nlme use with RxODE. So a test of one iteration could make all the tests faster as well as more platform independent. Of course it would also require testing/updating the windows results.

I still need to get some tests for SAEM, though there is some tests for FOCEi comparing objective functions and outputs between nlmixr and NONMEM.

mattfidler commented 6 years ago

By the way, with the updates windows returns:

> fit
Nonlinear mixed-effects model fit by maximum likelihood
  Model: DV ~ (nlmixr::nlmeModList("user_fn"))(lVM, lKM, lV, lCLD, lVT,      TIME, ID) 
  Log-likelihood: -13687.12
  Fixed: lVM + lKM + lV + lCLD + lVT ~ 1 
     lVM      lKM       lV     lCLD      lVT 
6.979419 5.724384 4.197173 1.461619 4.073274 

Random effects:
 Formula: list(lVM ~ 1, lKM ~ 1, lV ~ 1, lCLD ~ 1, lVT ~ 1)
 Level: ID
 Structure: Diagonal
                 lVM         lKM           lV         lCLD          lVT  Residual
StdDev: 0.0005680223 0.001065264 0.0008181519 0.0007510751 0.0004518648 0.3046571

Variance function:
 Structure: Power of variance covariate
 Formula: ~fitted(.) 
 Parameter estimates:
power 
    1 
Number of Observations: 2280
Number of Groups: 120 
>

billdenney commented 6 years ago

@mattfidler, I like that idea!

I agree that the accuracy of nlme() is the purview of R-core, but I'm guessing that users won't necessarily see it that way. It may be worth a note in the manual stating that explicitly and noting something like "Increasing pnlsTol or changing other nlmeControl() parameters may decrease the repeatability of results across platforms."

In terms of testing, what do you think if I were to move all the models themselves into helper files (in a location like "tests/testthat/models/") and so that we can more easily collaborate on generating the cross-platform tests? I'm thinking something like:

source("models/N062.R")
source("models/N062_ode.R")

to run the model and then we could each run all the models on Linux and Windows and find any discrepancies.

If so, I'll first generate a PR for separating the models (which will break all the tests). Then, we both run the models auto-generating appropriate testthat testing. Finally, we merge our testing and push a cross-platform testing PR. Sound good?

billdenney commented 6 years ago

One note on the testing: Do you know why VarCorr() returns a character matrix instead of numeric? Is that something that we could update/fix along the way? (I think no because I think it comes from nlme, but I'd like to check.)

mattfidler commented 6 years ago

In terms of testing, what do you think if I were to move all the models themselves into helper files (in a location like "tests/testthat/models/") and so that we can more easily collaborate on generating the cross-platform tests?

I think that is OK. I was thinking of adding some UI (aka nlmixr functions) to these as well. For example the model function for model 62 in the UI is:

two.compartment.oral.model <- function(){
    ini({ # Where initial conditions/variables are specified
          # '<-' or '=' defines population parameters
        lCl <- 1.6      #log Cl (L/hr)   
        lVc <- 4.5      #log Vc (L)   
        lQ  <- 1.5      #log Q (L/hr)   
        lVp <- 3.9      #log Vp (L)   
        lKA <- 0.1      #log V (L)   
        # Bounds may be specified by c(lower, est, upper), like NONMEM:
        # Residuals errors are assumed to be population parameters
        prop.err <- c(0, 0.2, 1) 
        # Between subject variability estimates are specified by '~'
        # Semicolons are optional
        eta.Vc ~ 0.1   
        eta.Cl ~ 0.1
        eta.Vp ~ 0.1   
        eta.Q  ~ 0.1
        eta.KA ~ 0.1
    })
    model({ # Where the model is specified
        # The model uses the ini-defined variable names
        Vc <- exp(lVc + eta.Vc)
        Cl <- exp(lCl + eta.Cl)
        Vp <- exp(lVp + eta.Vp)
        Q  <- exp(lQ + eta.Q)
        KA <- exp(lKA + eta.KA)
        # RxODE-style differential equations are supported
        K10<- Cl/Vc
        K12<- Q/Vc
        K21<- Q/Vp
        d/dt(depot)  = -KA*depot;
        d/dt(centr)  =  KA*depot+K21*periph-K12*centr-K10*centr;
        d/dt(periph) =        -K21*periph+K12*centr;
        ## Concentration is calculated
        cp = centr / Vc;
        # And is assumed to follow proportional error estimated by prop.err
        cp ~ prop(prop.err)
    })
}

For a pharmacometrician familiar with NONMEM and R, these should be similar to things in both languages. There are similar solved systems functions:

two.compartment.oral.model.Solved <- function(){
    ini({ # Where initial conditions/variables are specified
          # '<-' or '=' defines population parameters
        lCl <- 1.6      #log Cl (L/hr)
        lVc <- 4.5      #log Vc (L)
        lQ  <- 1.5      #log Q (L/hr)
        lVp <- 3.9      #log Vp (L)
        lKA <- 1      #log V (L)
        # Bounds may be specified by c(lower, est, upper), like NONMEM:
        # Residuals errors are assumed to be population parameters
        prop.err <- c(0, 0.2, 1)
        # Between subject variability estimates are specified by '~'
        # Semicolons are optional
        eta.Vc ~ 1
        eta.Cl ~ 1
        eta.Vp ~ 1
        eta.Q  ~ 1
        eta.KA ~ 1
    })
    model({ # Where the model is specified
        # The model uses the ini-defined variable names
        Vc <- exp(lVc + eta.Vc)
        Cl <- exp(lCl + eta.Cl)
        Vp <- exp(lVp + eta.Vp)
        Q  <- exp(lQ + eta.Q)
        KA <- exp(lKA + eta.KA)
        # Solved equations:
        linCmt() ~ prop(prop.err)
    })
}

These are the functions we are now advocating. The tests were written before this interface between all the methods was written. Once these are completed, the final model is run by any of the supported UI methods:

fit <- nlmixr(two.compartment.oral.model, data, "nlme",nlmeControl(), tableControl())
fit <- nlmixr(two.compartment.oral.model, data, "saem", saemControl(), tableControl())
fit <- nlmixr(two.compartment.oral.model, data, "focei", foceiControl(), tableControl())
fit <- nlmixr(two.compartment.oral.model, data, "foce", foceiControl(), tableControl())
fit <- nlmixr(two.compartment.oral.model, data, "fo", foceiControl(), tableControl())
fit <- nlmixr(two.compartment.oral.model, data, "foi", foceiControl(), tableControl())

Currently these will never make it into CRAN testing because it currently uses python for symbolics.

One note on the testing: Do you know why VarCorr() returns a character matrix instead of numeric? Is that something that we could update/fix along the way? (I think no because I think it comes from nlme, but I'd like to check.)

I think its because of printing.

If you want to fix it you can, though. The default print of nlme was too much for me (it output the data as well as the model), so the models from nlmixr's nlme are the following class:

> class(fit)
[1] "nlmixrNlme" "nlme"       "lme"

Since VarCorr is a method it could be corrected.

Most nlme methods work with nlmixr output objects but may return different things. The VarCorr method does not work with nlmixr objects in general, though it could too.

mattfidler commented 6 years ago

Added VarCorr as a numeric.

billdenney commented 6 years ago

What do you think of this as an option for how to test all the models:

https://github.com/billdenney/nlmixr/tree/simplify-model-tests

Specifically, all the modeling code will be moved to the tests/testthat/models directory including the expected values. As we move through the models, we may need to also move the tolerances to the models/ directory.

With this method, you lose context() but the same information is maintained with info=. We can also add any platform specificity by putting that into the assignment of the expected_values list rather than having any platform specificity in the testing code itself.

There is a bit of header to each of the modeling .R files to allow them to run without being in the test environment. That may be excessive, but my hope is that it simplifies things for people trying to muck around later.

mattfidler commented 6 years ago

You can add context() simply by calling it during the command the model loops, so I don't think it needs to be removed, but I don't mind if it is removed either.

I think it does simplify things. The header can be simplified a bit further by making a common include, but that is also up to you. I will compare with windows, and perhaps someone else can compare with Mac.

billdenney commented 6 years ago

As I'm working my way through the tests, I hit a snag on model 2, when I try to run with a single iteration (your suggested nlmeControl() parameters above). I think that we're going to have to try to minimize regardless, and I'm going to just let it try hard to minimize and set returnObject=TRUE to try to overcome pnlsTol issues.

mattfidler commented 6 years ago

Ok. The more I use nlme tbe more I prefer other methods.

billdenney commented 6 years ago

I tend to find it reasonable for very simple models, and unreasonable for anything else. As there are a lot of tests, I'm enlisting some help in the conversion. (So, I'll probably be a bit quiet for a few days on this-- work is still occurring.)

billdenney commented 5 years ago

Progress is being made!

I'm working through the testing, and it turns out that I should be able to test it on both Windows and Linux myself (I don't have access to a current Mac).

I think that we may also be able to fit some of the tests within the 10-minute CRAN limit if we choose them based on execution speed. We could run the fastest 22 model tests in 5 minutes (according to timing on my system). The values below are model execution time in seconds and model number sorted fastest to slowest model with some generous limits for minimization (https://github.com/billdenney/nlmixr/blob/77e78782df2f8fe674999394091327fb66d4c892/tests/testthat/models/helper-prep_fit.R#L8-L15).

    N001 N004_ode     N002     N003     N004     N012     N013     N025     N023     N026     N015     N024     N049     N033     N035     N047 N001_ode     N063 N002_ode N003_ode     N062 N014_ode     N048     N034 N020_ode N012.ode N010.ode N009_ode N042_ode N022_ode N047_ode     N061 N021_ode N023_ode 
       5        6        6        6        6        8        8        8        8        9       10       10       16       16       17       19       20       21       23       24       26       27       34       34       73       87       99      103      109      115      116      119      122      125 
N025_ode N049_ode N013_ode N034_ode N026_ode N011.ode N024_ode N029_ode N068_ode N054_ode N040_ode N030_ode N015_ode N035_ode N060_ode N032_ode N031_ode     N032 N033_ode N062_ode N070_ode     N014 N055_ode N056_ode N063_ode N069_ode N061_ode     N046 N048_ode     N060 N046_ode N041_ode 
     125      142      143      146      155      158      158      161      176      179      183      192      194      205      235      245      263      297      300      402      408      435      542      574      744      761     1747     2200     2716     2736    10013    13408

mattfidler commented 5 years ago

That is great news. Perhaps for the CRAN only test we can select one. That way we can run the same model in SAEM and FOCEi and nlme with nlmixr's UI (there are 2 different ways to run it in that case).

mattfidler commented 5 years ago

I can get help or borrow a mac to help with the MAC minimization. I will also get a virtual machine setup for the solaris install. However, I just remembered CRAN doesn't have python so the FOCEi and SAEM will not be tested in CRAN for now.

billdenney commented 5 years ago

I got the following surprising error when running the tests; it looked like knitr was trying to run for some reason. Do you know why this may happen? (My R session was using 26 GB of RAM and swapping out constantly, so that will have to be solved as well.)

models//model_N054_ode.R at 2018-11-26 06:23:43
Error in system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE) : 
  cannot popen '/usr/bin/which 'pdflatex' 2>/dev/null', probable reason 'Cannot allocate memory'
In addition: There were 41 warnings (use warnings() to see them)
26 Nov 2018 22:06:26 [rsession-rstudio] ERROR r error 4 (R code execution error) [errormsg=Error in system(paste(which, shQuote(names[i])), intern = TRUE, ignore.stderr = TRUE) : 
  cannot popen '/usr/bin/which 'pdflatex' 2>/dev/null', probable reason 'Cannot allocate memory'
]; OCCURRED AT: rstudio::core::Error rstudio::r::exec::{anonymous}::evaluateExpressionsUnsafe(SEXP, SEXP, SEXPREC**, rstudio::r::sexp::Protect*, rstudio::r::exec::{anonymous}::EvalType) /home/ubuntu/rstudio/src/cpp/r/RExec.cpp:159; LOGGED FROM: rstudio::core::FilePath rstudio::session::module_context::findProgram(const string&) /home/ubuntu/rstudio/src/cpp/session/SessionModuleContext.cpp:960
26 Nov 2018 22:06:26 [rsession-rstudio] ERROR system error 12 (Cannot allocate memory); OCCURRED AT: rstudio::core::Error rstudio::core::system::ChildProcess::run() /home/ubuntu/rstudio/src/cpp/core/system/PosixChildProcess.cpp:478; LOGGED FROM: bool rstudio::session::modules::svn::isSvnInstalled() /home/ubuntu/rstudio/src/cpp/session/modules/SessionSVN.cpp:444

mattfidler commented 5 years ago

I'm not sure; Are you running this in paralell?

Perhaps I can run on my system to help.

mattfidler commented 5 years ago

How do you run the tests? Apparently in Ubuntu rstudio, but using testthat? sourcing a file? I would like to repeat exactly how your are calling the tests.

I just added a commit to RxODE to make sure that memory is freed before calling another solve; This could be the issue, though if so the memory freeing in C isn't working quite right when using nlme.

billdenney commented 5 years ago

I'm running the tests with default settings in Ubuntu RStudio, yes. Some but not all of the tests appear to be running in parallel (when I look at top, sometimes 1 core is fully used, other times, all 8 cores are fully used).

At the moment, I'm manually running the tests by running the following line-by-line:

https://github.com/billdenney/nlmixr/blob/77e78782df2f8fe674999394091327fb66d4c892/tests/testthat/test-models.R#L8-L21

billdenney commented 5 years ago

As I'm running models//model_N054_ode.R after installing the current RxODE and nlmixr GitHub versions just now (merging nlmixr with my current branch), I'm still having what appear to be memory issues. It may be less of a leak than before (I'm not sure), but memory is increasing by about 1 MB/3 seconds when it's using a single processor.

Does single core use mean that the issue is while working in nlme and not in the parallel parts of nlmixr/RxODE?

mattfidler commented 5 years ago

You could be storing too much information.

Note nlme stores the original data, the fit data and some of the parameter data, so this is what could be is causing the memory issues.

Try looking at object.size(fit) to see what that gives you.

mattfidler commented 5 years ago

Also I looked at the nlme. By default it is single threaded. However, the nlme piece can use mclapply which duplicates the memory to speed up processing.

mattfidler commented 5 years ago

One last note, due to a CRAN request, we will likely be submitting this again to CRAN by this week/next week to keep it on CRAN. I don't think you need to rush to get this in by then, but don't be surprised if we submit this before the test suite is done. I would have rather waited for this to be completed but when CRAN threatens....

billdenney commented 5 years ago

The issue occurs even when I just run model N054, so I don't think that storage is the primary issue. (The fit object does get moderately large, but in the ~100 MB range (that's what I got in the Windows run), not huge.) Model N054 has been ticking away for about an hour now without making apparent progress. If you have a suggestion for how to look inside to see what may be the issue, I'd like to try that because I don't think that it's going to complete.

And: Don't get delisted from CRAN because you're waiting on this. I think that this will be ready in ~2 weeks (assuming that we can figure out this apparent memory leak quickly), and I'd rather get the tests right than get them on CRAN. If you're ready for a CRAN re-submit and this is the only item, go for CRAN now.

I may be able to get a subset of the tests ready ASAP and we could wait on the rest. For example, I could make all the tests that run in <10 minutes work right now, and assign the others to be skipped for now. Would you like that for the CRAN release (if possible)?

mattfidler commented 5 years ago

I don't think it is necessary to rush.

I will look into model054 after I get everything CRAN worthy in RxODE/nlmixr. Right now all the nlme tests are skipped on CRAN because of platform differences.

Not only would we have to subset the tests but also make sure they run fast enough and/or skip on the other environments. Its unclear how fast the CRAN environments are; I have seen wildly different numbers.

Currently RxODE tests on Mac/Linux/Windows by Travis and Appveyor, but the Mac builds are always a bit finicky on travis. Mac could also be dependent on how R is installed. I think travis uses the R mavericks build.

mattfidler commented 5 years ago

Happy New Year Bill;

We resubmitted the RxODE and the current version of RxODE is passing on travis/appveyor again; If you have time, perhaps we can revisit this.

mattfidler commented 5 years ago

The memory leak may have been corrected by an option I set in RxODE, though I'm unsure at this point.

billdenney commented 5 years ago

Happy New Year!

The memory leak was a challenge. I'll try to work on it in the next week or two. I will update to the current RxODE and nlmixr on GitHub master branches. Are there other dependent package changes that I should update?

mattfidler commented 5 years ago

I will check the memory leak before you begin. I thought the memory leak was in sympy, but reading the thread again it may be elsewhere.

mattfidler commented 5 years ago

I ran model 54 without any issues. It matched your test results. I also added the UI-style model 54 and ran nlme, saem, fo, foi, foce and focei for model 54 and they also all ran mostly successfully (the FOCE had questionsable sandwich/R covariance matrix and used the S matrix). The UI ran a bit better in this model than manually specified nlme, though I can't remeber the differences in how the model is actually run.

When running testthat, the way you guys have it set up is to test at the end. What is the reason for this? I prefer to look at the results as it is running.

Also @kiranmaiganji since you contributed to the testing, I would also like to add you as a contributor, if that is OK with you. If you have an OCRID, let me know.

billdenney commented 5 years ago

Great! I hope to have some time to run tests next week.

When running testthat, the way you guys have it set up is to test at the end. What is the reason for this? I prefer to look at the results as it is running.

I'm not sure that I understand the statement. The reason for the setup that is slightly unusual is to help with cross-platform testing. With this setup, we can run the tests and save the results easily so that we can compare results across platforms.

mattfidler commented 5 years ago

Ok

I meant you use

for (models){
# run models
}
for (models){
# check model results
}

Since you are trying to save results, I changed to

for (models){
 # run models
 # check new models
}

mattfidler commented 5 years ago

Hi @billdenney

I have a clarification from your code You said use:

#Generate this with dput(generate_expected_values(fit[[runno]]))

Does that mean go through each example one by one and add the output to each file? or to add the output all at once in a script?

I changed some of the ODE tolerances so that nlme can have better performance.

billdenney commented 5 years ago

This is when running one-by-one to add the output to each file. If you run them all, you can run that in a loop to then generate all the expected outputs, but they're not explicitly saved anywhere.

nlmixrdevelopment / nlmixr

Make Tests Cross-Platform #99