bounds.R Code Simplification

billdenney commented 4 years ago

I'm trying to track down an issue in dynmodel, and that took me to look at bounds.R. The code in there is complex, and I think that it could be much simpler using some concepts that I put into my formulops package. (I don't think that the package itself will help much there, but I learned a lot about formula parsing while writing it. And, there are a few helpful functions that can help extract parts of items well.)

That said, there is a lot of complexity in bounds.R, and I recognize that it is not a typical jumping-in point.

Would you like for me to take a shot at simplifying bounds.R?

Also, if simplified, I think it would make it simpler to implement what is decided in #314 (assuming that it ends up going into the ini() block).

mattfidler commented 4 years ago

It is up to you. If you do make sure these tests run correctly:

https://github.com/nlmixrdevelopment/nlmixr/blob/master/tests/testthat/test-bounds.R

mattfidler commented 4 years ago

Also if you can, fix

https://github.com/nlmixrdevelopment/nlmixr/issues/253

mattfidler commented 4 years ago

If it is easy drop zeros that shouldn't be estimated https://github.com/nlmixrdevelopment/nlmixr/issues/44

That is

a + b + c ~ c(0.1,
              0  , 0.1
              0  , 0  , 0.1)

Should be equivalent to:

a~0.1
b~0.1
c~0.1

And

a + b+  c ~ c(0.1,
              0.1, 0.1,
              0,   0, 0.1)

should be equivalent to

a + b ~ c(0.1,
          0.1, 0.1)

c ~ 0.1

mattfidler commented 4 years ago

Of course comments should be still translated to label

mattfidler commented 4 years ago

And if you want to give a new feature to bounds, you could consider:

a + b ~ cor(...)

To specify the correlation matrix/Sd values and then translate them to the covariance matrix

mattfidler commented 4 years ago

That along with #314 is the complete wish list

mattfidler commented 4 years ago

One last nice to have is to move this to lotri. That way it can be used elsewhere. Some of this parsing is already done there anyway

billdenney commented 4 years ago

All of these seem doable...

First question:

What is the reason for returning the evaluation of the function in the parent frame if it does not have a srcref attribute here (line 18)?

https://github.com/nlmixrdevelopment/nlmixr/blob/78883d0192a544ac171c17d700f3956e56f78eb7/R/bounds.R#L15-L19

I understand that the srcref is used so that the comments can be extracted, but I would have assumed that if srcref is not available, that body(fun) would be the way to get the desired result rather than evaluating the function.

My best guess at the moment is that if you pass in a function without a srcref attribute, it must return a data.frame with all the expected columns. Is that correct? The only time that I think that would likely happen is when running it from a package or other location where the srcref attribute is removed.

If all that is correct, what would you think about the following modification of how bound extraction works:

If "srcref" is available, check for comments that are used as labels. (As is currently done)
- New and important: Soft-deprecate comments so that if comments are used, a deprecation warning comes up to say that they should use label("my label") instead of # my label. Give them the code to use to replace their current code as part of the message. (Giving them the code would make it a long message, but I think that would be helpful to get the user to move to the new functionality more quickly.)
  - There would be no rush to fully-deprecate the comment as a label feature. The code used to do that would be very much off to the side and not hurting the generality of bounds generation.
If "srcref" is not available, execute the function. (As is currently done)
- New: Check if it returns a data.frame, and check that the data.frame either has the correct columns or is only missing columns that are not required. If missing columns that are not required, add the required columns with the default value (e.g. "" for the parameter name). Return the full data.frame with all columns confirmed to be present.
- New and possibly a breaking change (but I don't think that it would break anything that works as intended, if I understand the code): If it does not return a data.frame, try to use body(fun) to get the desired initial conditions, etc. where comments would not be available as labels (since "srcref" is not available, that would be impossible, anyway).

mattfidler commented 4 years ago

What is the reason for returning the evaluation of the function in the parent frame if it does not have a srcref attribute here (line 18)?

This was to allow a nlmixr function to parse when running the code:

library(nlmixr)
one.cmt <- function() {
  ini({
    ## You may label each parameter with a comment
    tka <- 0.45 # Log Ka
    tcl <- 1 # Log Cl
    ## This works with interactive models
    ## You may also label the preceding line with label("label text")
    tv <- 3.45; label("log V")
    ## the label("Label name") works with all models
    eta.ka ~ 0.6
    eta.cl ~ 0.3
    eta.v ~ 0.1
    add.sd <- 0.7
  })
  model({
    ka <- exp(tka + eta.ka)
    cl <- exp(tcl + eta.cl)
    v <- exp(tv + eta.v)
    linCmt() ~ add(add.sd)
  })
}

print(one.cmt())
#> Warning in nlmixrUI(parent.frame(1)): Some information (like parameter labels)
#> is lost by evaluating a nlmixr function
#> __ RxODE-based 1-compartment model with first-order absorption _________________ 
#> -- Initialization: ------------------------------------------------------------- 
#> Fixed Effects ($theta): 
#>  tka  tcl   tv 
#> 0.45 1.00 3.45 
#> 
#> Omega ($omega): 
#>        eta.ka eta.cl eta.v
#> eta.ka    0.6    0.0   0.0
#> eta.cl    0.0    0.3   0.0
#> eta.v     0.0    0.0   0.1
#> -- mu-referencing ($muRefTable): ----------------------------------------------- 
#> +-------------------+
#> ¦ tka     ¦ eta.ka  ¦
#> +---------+---------¦
#> ¦ tcl     ¦ eta.cl  ¦
#> +---------+---------¦
#> ¦ tv      ¦ eta.v   ¦
#> +-------------------+
#> -- Model: ---------------------------------------------------------------------- 
#>     ka <- exp(tka + eta.ka)
#>     cl <- exp(tcl + eta.cl)
#>     v <- exp(tv + eta.v)
#>     linCmt() ~ add(add.sd) 
#> ________________________________________________________________________________

^{Created on 2020-06-25 by the reprex package (v0.3.0)}

ini is a function in nlmixr as well

Soft-deprecate comments so that if comments are used, a deprecation warning

I agree. Comments will not be present in the following situations:

Running the ini({}) function directly
Running the whole script in non-interactive mode (ie Rscript or any one of its variants)
Also when in a package (though that is less likely)

This is something that should be soft-depreciated. When I added it the R version I used seemed to add comments to the batch runs, but now that is not the case.

I still think comments are easier to remember than label("") but I can't change how R handles them.

Check if it returns a data.frame, and check that the data.frame either has the correct columns or is only missing columns that are not required

This is OK, but would be a breaking change (which I will describe below)

If it does not return a data.frame, try to use body(fun) to get the desired initial conditions, etc.

This is also OK, but would be a breaking change

Why is a breaking change?

That is found here:

https://github.com/nlmixrdevelopment/nlmixr/blob/78883d0192a544ac171c17d700f3956e56f78eb7/R/ui.R#L148-L158

When nlmixr runs the ini function it assigns the bounds to the hidden variable .ini. This is used so that when nlmixr runs the model function it picks up the bounds and then creates the "compiled" model.

If you do the second step, it shouldn't be a breaking change.

mattfidler commented 4 years ago

I have been trying to use the CRAN style guide for warnings/stop

https://cran.r-project.org/doc/manuals/r-release/R-exts.html#Diagnostic-messages

mattfidler commented 4 years ago

I just noticed you can use ngettext to be able to use sprint and other variants (like the warning messages)

billdenney commented 4 years ago

What is the reason for returning the evaluation of the function in the parent frame if it does not have a srcref attribute here (line 18)?

This was to allow a nlmixr function to parse when running the code:

OK, that seems useful, but it also seems like it makes it difficult to tell if you're wanting to run the code now or if you are wanting to do the parsing now. If I'm reading the code correctly, it looks like line 18 would only apply when running an ini function and if so, could we simplify the beginning of nlmixrBounds() so that the ini() case is handled by a different function call? I think that would make other aspects of switching over to not looking for comments better.

Additionally, I think that attempting parsing by regexp will likely end up with trouble were someone to have code that looks like this:

one.cmt <- function() {
  ini({
    a <- 1; label("# of people")
    d <- 2
   })
   model({
     b~a + add(d)
   })
}

billdenney commented 4 years ago

Let me make my last comment clearer. I think that ideally, nlmixrBounds() would take as input a function, call, or similar and perform parsing on it to get the bounds (and associated information). That would make the function simpler, and I think that it could operate equivalently because the scenario where you run the function could be handled by having the ini() function do something like run nlmixrBounds(quote(substitute(x))) (or something like that-- I never get quote(), substitute(), deparse(), etc. right without a bit of playing around).

Does that sound reasonable to you?

mattfidler commented 4 years ago

So, simply have nlmixrBounds() handle the function deparsing -> conversion to correct bounds, right?

That is fine with me.

billdenney commented 4 years ago

Right.

billdenney commented 4 years ago

What do you think about a limitation:

If someone is using label() to assign labels to parameters, they cannot also use comments?

The benefit of that limitation is that there are inherent bugs in label detection via comments when both can be present. One specific example is:

function() {
  1; label("This has a # comment symbol and that breaks things")
}

If we detect label use, then we can choose to ignore the comments (maybe with a message to the user now) and that would prevent the above from being a bug.

mattfidler commented 4 years ago

That is OK with me, but I don't think you should exclude comments. I think commented models are the best :1st_place_medal:

billdenney commented 4 years ago

The goal is to allow comments without breaking things. What I'm thinking is that the following may occur:

function() {
  a <- 1 # I have a label
}

Becomes an initial condition of 1 with a parameter name of a and a label of "I have a label".

But, how should things be handled if someone does something like:

function() {
  a <- 1 # I have a label
  label("I have two labels!")
}

Currently, it would get parsed something like:

function() {
  a <- 1
  label("I have a label")
  label("I have two labels!")
}

And then, what do you do with the two labels?

I'm suggesting that if they use label() anywhere within their specification, all comments would be ignored from a parsing perspective. They would be allowed in the code, but they would have no effect on simulation or summary.

mattfidler commented 4 years ago

Currently only the last label is kept you can see that here:

library(nlmixr)
one.cmt <- function() {
  ini({
    ## You may label each parameter with a comment
    tka <- 0.45 # Log Ka
    label("label3")
    label("label4")
    tcl <- 1 # Log Cl
    ## This works with interactive models
    ## You may also label the preceding line with label("label text")
    tv <- 3.45; label("log V")
    ## the label("Label name") works with all models
    eta.ka ~ 0.6
    eta.cl ~ 0.3
    eta.v ~ 0.1
    add.sd <- 0.7
  })
  model({
    ka <- exp(tka + eta.ka)
    cl <- exp(tcl + eta.cl)
    v <- exp(tv + eta.v)
    linCmt() ~ add(add.sd)
  })
}
m <- nlmixr(one.cmt)
print(as.data.frame(m$ini))
#>   ntheta neta1 neta2   name lower  est upper   fix  err  label condition
#> 1      1    NA    NA    tka  -Inf 0.45   Inf FALSE <NA> label4      <NA>
#> 2      2    NA    NA    tcl  -Inf 1.00   Inf FALSE <NA> Log Cl      <NA>
#> 3      3    NA    NA     tv  -Inf 3.45   Inf FALSE <NA>  log V      <NA>
#> 4     NA     1     1 eta.ka  -Inf 0.60   Inf FALSE <NA>   <NA>        ID
#> 5     NA     2     2 eta.cl  -Inf 0.30   Inf FALSE <NA>   <NA>        ID
#> 6     NA     3     3  eta.v  -Inf 0.10   Inf FALSE <NA>   <NA>        ID
#> 7      4    NA    NA add.sd     0 0.70   Inf FALSE  add   <NA>  linCmt()

^{Created on 2020-06-25 by the reprex package (v0.3.0)}

mattfidler commented 4 years ago

I'm unsure if that is the "right" behaviour.

mattfidler commented 4 years ago

The benefit of that limitation is that there are inherent bugs in label detection via comments when both can be present. One specific example is:

That is fixed by a simple regular expression fix (above)

library(nlmixr)

one.cmt <- function() {
  ini({
    ## You may label each parameter with a comment
    tka <- 0.45; label("la # bel3")
    tcl <- 1 # Log Cl
    ## This works with interactive models
    ## You may also label the preceding line with label("label text")
    tv <- 3.45; label("log V")
    ## the label("Label name") works with all models
    eta.ka ~ 0.6
    eta.cl ~ 0.3
    eta.v ~ 0.1
    add.sd <- 0.7
  })
  model({
    ka <- exp(tka + eta.ka)
    cl <- exp(tcl + eta.cl)
    v <- exp(tv + eta.v)
    linCmt() ~ add(add.sd)
  })
}

m <- nlmixr(one.cmt)
print(as.data.frame(m$ini))
#>   ntheta neta1 neta2   name lower  est upper   fix  err     label condition
#> 1      1    NA    NA    tka  -Inf 0.45   Inf FALSE <NA> la # bel3      <NA>
#> 2      2    NA    NA    tcl  -Inf 1.00   Inf FALSE <NA>    Log Cl      <NA>
#> 3      3    NA    NA     tv  -Inf 3.45   Inf FALSE <NA>     log V      <NA>
#> 4     NA     1     1 eta.ka  -Inf 0.60   Inf FALSE <NA>      <NA>        ID
#> 5     NA     2     2 eta.cl  -Inf 0.30   Inf FALSE <NA>      <NA>        ID
#> 6     NA     3     3  eta.v  -Inf 0.10   Inf FALSE <NA>      <NA>        ID
#> 7      4    NA    NA add.sd     0 0.70   Inf FALSE  add      <NA>  linCmt()

^{Created on 2020-06-25 by the reprex package (v0.3.0)}

billdenney commented 4 years ago

I'm thinking of making it a bit more strict: You can only set an attribute (label(), condition(), and any future ones that come up) once, but you can do it in any order. In general, it seems like the user is doing something where they don't understand what they are doing if they add labels more than once (or conditions more than once).

So, I think it is most helpful to alert the user that there is something wrong with the initial condition. (A label is ignored, etc.)

mattfidler commented 4 years ago

I agree with you on the label() and backTransform() statement.

I do not think we should provide a condition() statement though. I think we should stick to the lotri() conditions (|) here. My reasoning is:

| is a typical way to specify nesting levels in R
| isn't stripped like comments
| is consistent with lotri()

It would be even nicer if we could extract lower upper and nu from these as well. This way nested models could be simulated from a nlmixr model specification instead of a just a completed nlmixr model.

This is syntax is in the nesting vignette for RxODE:

https://github.com/nlmixrdevelopment/RxODE/blob/pruneBranch/vignettes/RxODE-nesting.Rmd

Just my point of view.

mattfidler commented 4 years ago

Note due to some "fixes" the nesting vignette is not working anymore. I will see where it broke down and fix it.

billdenney commented 4 years ago

I'm heavily rewriting bounds.R right now to cover a lot more flexibility with simpler code. Rather than fix bounds.R right now, I think that enumeration of the ways that fixed should be possible to use would be more helpful, and we can build tests around those.

I'll not add a condition(). And, I'm trying to make what I'm working make use of lotri(), but I'm hitting some bumps there. As I get to a first draft of the PR, we can look at it.

mattfidler commented 4 years ago

The tests for all the various forms of fixed are already there in:

https://github.com/nlmixrdevelopment/nlmixr/blob/foceiDur/tests/testthat/test-bounds.R

billdenney commented 4 years ago

That test set is what I'm working toward. I was just thinking that if fixed wasn't working in a vignette that there may be something that isn't in the test set (either that should be tested to work or should be tested as a known error).

mattfidler commented 4 years ago

Fixing parameters in bounds is different than the ability to fix parameters in an estimation routine. As far as I know all the fixing that is supposed to work, works.

However, on nice to have, would be to have lower triangular zeros be flagged as possible places to fix:

        ~ c(40,
            0.1, 20,
            0, 0.1, 30)

With some sort of possible warning that the zero will be fixed (but I think this may be estimation specific)

mattfidler commented 4 years ago

There is a function in RxODE that checks for banded matrices and this may already be fixed. Perhaps a warning should be generated there (since that is where the focei matrix handling actually is...)

mattfidler commented 4 years ago

The only unhandled case I can think of otherwise is if you add cor to a matrix; Then:

cor(fixed(...)) 
fixed(cor(...))
cor(..,fixed(.),...)

shold all work.

mattfidler commented 4 years ago

Perhaps cor should be handled in lori instead (fixing could too).

billdenney commented 4 years ago

What I'm working on should work with cor(), and any other call, too, I think. What I'm working on should work in almost any scenario where you may want to evaluate code. Generally, what I'm working on should allow any call to any R function while working with a few special functions (like cor()) differently.

I think that if the fixing of a zero would be estimation-specific, then I wouldn't want to do it during the ini block processing. It would be easy to add a warning like:

"Consider fixing zeros in omega matrices: " (and then the names of all omega zeros that are not fixed)

and, I think that warning would make sense.

mattfidler commented 4 years ago

That sounds good to me.

I'm unsure you can fix anything in an omega matrix except lower triangular zeros. So if this helps, perhaps we can remove some of the cases.

On Fri, Jun 26, 2020 at 9:59 AM Bill Denney notifications@github.com wrote:

What I'm working on should work with cor(), and any other call, too, I think. What I'm working on should work in almost any scenario where you may want to evaluate code. Generally, what I'm working on should allow any call to any R function while working with a few special functions (like cor()) differently.

I think that if the fixing of a zero would be estimation-specific, then I wouldn't want to do it during the ini block processing. It would be easy to add a warning like:

"Consider fixing zeros in omega matrices: " (and then the names of all omega zeros that are not fixed)

and, I think that warning would make sense.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/nlmixrdevelopment/nlmixr/issues/322#issuecomment-650225805, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD5VWWXLLRLF4ZFBK35T5LRYSZVPANCNFSM4OHYFZSA .

billdenney commented 4 years ago

I'm only going to remove cases VERY carefully. I'm adding a lot of smaller test cases for supporting functions that I'm adding to help nlmixrBounds.

billdenney commented 4 years ago

I'm trying to ensure that I know how to correctly detect err parameters. It looks like they are detected by having the following properties:

In the ini block, they are assigned just like a theta.
- Can they be fixed?
You have to look into the model block to find them.
- As far as I can tell, there is no way to tell that something is an error initial from just the ini block. Is that correct?
In the model block, they are the only item that uses formula notation for the sampling.
Within that formula notation, they must consist only of output parameter, tilde, plus-separated error model types. Is that correct?
- Is there a canonical list of possible error models in the code (if not, is there one in the documetnation)? When I searched for "dnorm", I found several character vectors that had different sets of them (presumably because different estimation engines support different types of error models).
- If there is no canonical list in the code, then I would suggest that it would be worth adding a canonical list in the code that would be updated when new error models became available so that all methods could refer back to that list.
- One way to potentially make the list would be to have it as a boolean matrix where the columns were named after the error models and the rows after estimation engines. Then, that could be used both as a documentation help and to ensure consistency between parts of the code. (That could also be overkill.)

billdenney commented 4 years ago

I found what I think is the canonical list of error distributions here:

https://github.com/nlmixrdevelopment/nlmixr/blob/cf5c05d95e152b538cd5ab2a73b1a9acffe77e35/R/ui.R#L787

mattfidler commented 4 years ago

I'm trying to ensure that I know how to correctly detect err parameters. It looks like they are detected by having the following properties:

Errors are detected by a combination of the ini and the model. These are out of scope for the bounds.

Like Uppsula, in focei I use the standard deviation modeling of parameters and fix the resiudals to 1. Therefore these are THETAs in the same sense.

Can they be fixed?

Yes. They can be fixed in focei but no other routine currently

In the model block, they are the only item that uses formula notation for the sampling.

Currently this is correct.

I found what I think is the canonical list of error distributions here:

Actually it is here:

https://github.com/nlmixrdevelopment/nlmixr/blob/cf5c05d95e152b538cd5ab2a73b1a9acffe77e35/R/ui.R#L756-L785

It doesn't mean each one is supported in each routine. It is a first step in getting the nlmixr function -> gnlmm and gnlmm2

mattfidler commented 4 years ago

tbs boxCox tbsYj yeoJohnson are also supported in focei

bern and one other is supported in saem

billdenney commented 4 years ago

Alright, then I have a working rewrite! (with the exception of the err test :( ) I'm going to leave the failing err test in the test file, but from your comment above, the test should probably be removed or reworked in some way.

mattfidler commented 4 years ago

I believe the ini data frame is modified in the model block to signify the error

mattfidler commented 4 years ago

Thanks Bill.

billdenney commented 4 years ago

I think that all of the features discussed in here have either been completed or moved to separate issues. Please review to confirm that it is all covered, and if you agree, please close this one.

nlmixrdevelopment / nlmixr

bounds.R Code Simplification #322

Why is a breaking change?