metrumresearchgroup / mrgsolve

Simulate from ODE-based population PK/PD and QSP models in R

https://mrgsolve.org

GNU General Public License v2.0

132 stars 36 forks source link

User can provide ETAs on the data set #1037

Closed kylebaron closed 1 year ago

kylebaron commented 1 year ago

Summary

rather than simulating new ETAs, the user adds ETA1, ETA2, ... ETAn as columns on the input data set

Details

pass etasrc = "omega" to simulate from omega (current default, previous behavior)
pass etasrc = "data" to scan data set columns for ETAn; whatever is found is used to form the nid x neta matrix of ETAs; if an ETAn isn't found, we'll fill that with 0
pass etasrc = "data.all" to function just like etasrc = "data" but generate an error if we don't find all ETAn
When the ETA number is greater than 9, start looking for ETn first; if we don't find that, then look for ETAn
It is an error to have both ETn and ETAn in the data set for n > 9
It is an error to pass "data" or "data.all" and a data set with no ETAn present

See the attached pdf for detail and behavior. etasrc.pdf

FelicienLL commented 1 year ago

Hi Kyle, very nice feature, I can already imagine many situation this would be helpful. I have one question, do you think it could be possible to set the model with default values of ETA(1) ETA(2) that would not be equal to zero? I'm asking because in mapbayr, you know that we run the trick where each ETA must be duplicated (ETA(1) for stochastic simulations, and ETA1 in $PARAM for deterministic predictions). It is working well but I'd love that we don't have to do this trick and if we could use a "regular" mrgsolve model. With your proposal implementation (ETA_1 read from the dataset), it would now be possible to do some Bayesian estimation of ETA from a regular mrgsolve model (internally, I would just add ETA_1 ETA_2 variables in the dataset instead of updating ETA1 ETA2 in $PARAM as it is currently implemented). But once the estimation is done, the user can currently call use_posterior() in order to return an updated model with the estimation of ETA1, ETA2 in $PARAM (e.g. ETA1 = 0.123, ETA2 = -0.456), so that the user can simulate alternative dosing regimens for the patient, from a model object only, and no need to remember to pass individual values of ETA1, ETA2 in a data set. So, to sum it up, do you think it would be possible to attach a vector of ETA values to the model? I would be happy to discuss more details if you are interested in. Many thanks in advance Félicien

kylebaron commented 1 year ago

Hi @FelicienLL - Yes ... I was definitely thinking of you and mapbayr when working up this feature. We have a different use case for needing option of either ETA (fixed or newly simulated) in the problem and want to come up with a way to handle all of those.

Some questions:

I was thinking of allowing alternative workflow of letting the user attach a vector or matrix to draw ETA from
Should we require all ETAs to be present and just take them in order? or identify ETAs from the names?
Should ETA vector or matrix be "stored" in the model object? or have to be passed through mrgsim()?

Other questions coming; thanks for your input and discussion on this.

Kyle

FelicienLL commented 1 year ago

Something else currently implemented in mapbayr thanks to the ETA1 + ETA(1) trick is the possibility to let mrgsolve simulate from (a gaussian approximation of) the posterior distribution: MVN(μ,Ω) with μ the vector of estimates of ETA and Ω a variance-covariance matrix obtained from the hessian (like NONMEM provides in the .phi file). Indeed, what I do in mapbayr is that I update ETA1,ETA2 in $PARAM with μ and $OMEGA with Ω so that mrgsolve simulates ETA1 + ETA(1) with a distribution μ + MVN(0,Ω), which is, correct me if I'm wrong, equal to MVN(μ,Ω).

So, instead of implementing a vector of default ETA in the model as I initially suggested, maybe it would be more relevant to give the possibility to change the mean of the multivariate normal distribution ETA are drawn from? This single vector would be defaulted to c(MU_1 = 0, MU_2 = 0) but could be updated with other values, either at the estimation steps or for "posterior" stochastic simulations.

In addition with your proposal of passing ETA_1,ETA_2 from the data, we would deal with any possibilities:

Fixed individual parameters at the model level: mapbayr::use_posterior() would return a model with c(MU_1 = 0.12, MU_2 = -0.34). This way, no need for the user to manually pass ETA_1 = 0.12 in the data or to mrgsim. $OMEGA would be set to zero, unless the user wants to run stochastic simulations from the var-cov derived from the hessian, we would just have to update $OMEGA properly.
Simulate multiple IDs from a pre-existing matrix of ETA (newly simulated or extracted from NONMEM's .phi): pass them as ETA_1,ETA_2 column in data_set/idata_set.

EDIT: This proposal feature would also ease the implementation of adaptive-MAP described by Tingjie Guo.

kylebaron commented 1 year ago

Hi @FelicienLL -

Just to follow up on this: really cool how you're tweaking ETA and OMEGA to do these simulations. I think there could be a way to carry this information around in the model object, but likely require some specialization of the model. Let's see how this current proposal works out; I'm hopeful it will still work for you to get the ETA1 + ETA(1) requirement out from mapbayr. I don't know what the performance will be like having to update the data set every iteration; but willing to see about alternatives (like idata or an object in $ENV that is used to pass the information through).

Kyle

FelicienLL commented 1 year ago

Hi, this looks very good. I noted that you enabled the possibility to pass variables called ET11, ET12 instead of ETA11, ETA12, which would be helpful for ETAs coming from NONMEM's $TABLE.

library(mrgsolve)
#> 
#> Attachement du package : 'mrgsolve'
#> L'objet suivant est masqué depuis 'package:stats':
#> 
#>     filter
code <- "
$OMEGA 1 1 1 1 1 1 1 1 1 1 1
$CAPTURE ETA(11); 
"
mod <- mcode("mod", code, end = -1)
#> Building mod ...
#> done.

data1 <- as_data_set(ev(amt = 1, cmt = 0, ETA11 = 99))
mrgsim_df(mod, data1, etasrc = "data")
#>   ID time ETA_11
#> 1  1    0     99

data2 <- as_data_set(ev(amt = 1, cmt = 0, ET11 = 9999))
mrgsim_df(mod, data2, etasrc = "data")
#>   ID time ETA_11
#> 1  1    0   9999

data2bis <- as_data_set(ev(amt = 1, cmt = 0, ET3 = 9999))
try(mrgsim_df(mod, data2bis, etasrc = "data"))
#> Error : at least one ETA must be provided when etasrc is data.

There are however no warnings if both syntax are present (ET11 has the priority over ETA11), is this something we should be concerned about? I acknowledge the case would probably be rare.

data21 <- as_data_set(ev(amt = 1, cmt = 0, ET11 = 9999, ETA11 = 99))
mrgsim_df(mod, data21, etasrc = "data")
#>   ID time ETA_11
#> 1  1    0   9999

data12 <- as_data_set(ev(amt = 1, cmt = 0, ETA11 = 99, ET11 = 9999))
mrgsim_df(mod, data12, etasrc = "data")
#>   ID time ETA_11
#> 1  1    0   9999

And I was a bit confused that the syntax ETA_11 was not allowed since it is the name used internally, and returned by mrgsim() itself. I could imagine the case of a user who want to make reproducible simulations by running mrgsim() first, capturing the ETA_n, and re-use them into a new dataset in order to be reproducible. Would you say it is something to be considered?

data3 <- as_data_set(ev(amt = 1, cmt = 0, ETA_11 = 99))
try(mrgsim_df(mod, data3, etasrc = "data"))
#> Error : at least one ETA must be provided when etasrc is data.

Thanks Félicien

kylebaron commented 1 year ago

Hi @FelicienLL -

Thanks for reviewing this; your feedback is really important to me.

I added a check for ambiguous names (like passing both ETA11 and ET11); it is an error do do this. What do you think?
I want the default handling to accept what NONMEM tables out and agree it's a little inconsistent to have this not matching the default output from mrgsolve. I'm going to leave it as is for now; the names can get converted pretty easily and we could eventually provide a function ton convert them or even an ETAS(1:last) type notation that would use ETAn type naming. Maybe there's a block option @eta-sep that would let the user pick if or what separator would go there.

library(mrgsolve)
#> 
#> Attaching package: 'mrgsolve'
#> The following object is masked from 'package:stats':
#> 
#>     filter

code <- '
$OMEGA
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1

$CAPTURE ETA(1), ETA(9), ETA(10), ETA(16), ETA(20), ETA(21)
'

mod <- mcode("code", code)
#> Building code ...
#> done.

data <- expand.ev(amt = 100, ETA1 = 2, ETA5 = 5, ETA9 = 9, ET10 = 10, ET16 = 16, 
                  ETA20 = 20, ET21 = 21, ET20 = 202, cmt = 0)

mrgsim(mod, data, etasrc = "data")
#> Error:
#> ! Ambiguous ETA names when `etasrc` is "data":
#> ✖ Found both ETA20 and ET20 in the data.

#> Backtrace:
#>     ▆
#>  1. ├─mrgsolve::mrgsim(mod, data, etasrc = "data")
#>  2. │ └─mrgsolve::mrgsim_d(x, data = data, ...)
#>  3. │   ├─base::do.call(...)
#>  4. │   └─mrgsolve (local) `<fn>`(...)
#>  5. └─mrgsolve (local) `<fn>`(`<chr>`, 20L, "data")
#>  6.   └─rlang::abort(...)

^{Created on 2023-02-16 with reprex v2.0.2}

FelicienLL commented 1 year ago

Hi Kyle, thanks for your feedback too.

Looks good to me. Errors well if multiple ambiguities.
Also agree that compatibility with NONMEM should be a priority. My suggestion for implementing ETA_i compatibility would be, when etasrc = "data", to first look for values named either ETA_i, ETAi , or ETi , then check ambiguous names, and rename these into ETA_i internally. This way, mrgsolve still uses ETA_i internally, still outputs ETA_i, accepts data with ETA_i, and also accepts data with ETAi and ETi as aliases for ETA_i. And to be extremist, we could also imagine compatibility with @labels if they were defined in $OMEGA (looking for ECL in the data etc...).

Some more things came to my mind but I don't know if they are relevant at all, so feel free to ignore:

For each ID, only the first row is looked at. Should mrgsolve check unique values in ETAi and error if several different values are found for a single individual?
From what I experienced, ETAi will not be looked for if it comes from ev() or idata_set(). Should it be stated in the documentation? Should this be implemented as a feature? Or nothing (would probably be fine enough, I don't know).