Summary

Background: I've been wondering about something like this or a while so it would be easier to share models that were dependent on NONMEM outputs. Then Richard asked about this in #1177 and we got motivated to really try to push this out.

Convert an mrgsolve model object to a "transport" file format. I was originally aiming to do this in json format, but tried yaml too and yaml seemed (1) much easier to read (not sure if there is anything else I can do to improve rendering of the json) and (2) more reliable in rendering the object. So, I'm providing methods for either format.

The main feature here is that it frees you from requiring NONMEM output when reading in the model file. But you can also see almost everything in the model object, including, for example, parameters or other settings that have been updated. And you could read in this transport format, programmatically make changes to a parameter or setting, then write back. You can now and unlikely you'll ever be able to re-write actual model code (not what this is intended for).

Write: mwrite_yaml() and mwrite_json()
Read: mread_yaml() and mread_json(); these convert the yaml (or json) to mrgsolve model code, compile, and load
Also providing functions to just convert yaml (or json) to mrgsolve model code (yaml_to_cpp() and json_to_cpp())

EDIT: removed all json functionality on code review; may revisit at another time if there is some need that can be met with json and not yaml.

The transport format tracks

source - the transport code is rejected if this is not mrgsolve::mwrite
mrgsolve version number (not currently used)
transport formate version (not currently used)

Digits

Writers for both yaml and json accept a digits argument; this is defaulting to 8 right now.

PKMODEL

We do some minor gymnastics with the PKMODEL block. This is because you can declare compartments in $PKMODEL and we need to keep PKMODEL in the model spec, but need to remove the compartment declaration since we are taking compartments from the model init().

SET

Also some work to coordinate the $SET block. There are a bunch of slots in the model object that we'd like to carry into the transport file and then bring back when we load the model from transport. When calling mread_yaml(), we do that update from a block of settings in the transport file (under update). But I also wanted to be able to write all that into the file. So, on write, we parse out $SET and track that in the transport file. When requested, we can append to that model code with these update items.

CAPTURE

Late decision to explicitly handle capture items in the list / yaml. This means $CAPTURE isn't going to appear in the general code item, but rather in a capture item, along the lines of $PARAM and $CMT. As far as I can tell, this means that no annotations will appear in the model after it passes through yaml / list.

Example

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(mrgsolve)
#> 
#> Attaching package: 'mrgsolve'
#> The following object is masked from 'package:stats':
#> 
#>     filter

Load a regular mrgsolve model

mod <- modlib("1005")
#> Loading required namespace: xml2
#> Building 1005 ... done.

Write

We can write it out in yaml format

y <- mwrite_yaml(mod, file = "1005.yaml")

The yaml is easier to read

cat(readLines("1005.yaml"), sep = "\n")
#> source: mrgsolve::mwrite
#> mrgsolve: 1.4.1.9000
#> format: yaml
#> version: 1.0
#> model: '1005'
#> prob:
#> - 1005 phase1 2 CMT like 1004 but diff. initial on V3
#> - Run
#> - file.show(system.file("nonmem", "1005", "1005.ctl", package = "mrgsolve"))
#> - for equivalent NONMEM control stream.
#> param:
#>   SEX: 0.0
#>   WT: 70.0
#>   THETA1: 9.50788555
#>   THETA2: 22.79098949
#>   THETA3: 0.07143366
#>   THETA4: 3.47450589
#>   THETA5: 113.27671224
#>   THETA6: 1.02435433
#>   THETA7: 1.19211818
#> init:
#>   GUT: 0.0
#>   CENT: 0.0
#>   PERIPH: 0.0
#> omega:
#>   data:
#>     '...':
#>     - 0.21387884
#>     - 0.1207702
#>     - 0.09451047
#>     - -0.01162777
#>     - -0.03720637
#>     - 0.04656315
#>   labels:
#>     '1':
#>     - '.'
#>     - '.'
#>     - '.'
#>   names: '...'
#> sigma:
#>   data:
#>     '...':
#>     - 0.04917071
#>     - 0.0
#>     - 0.20176885
#>   labels:
#>     '1':
#>     - '.'
#>     - '.'
#>   names: '...'
#> env: []
#> plugin: base
#> capture:
#> - CL
#> - Q
#> - V2
#> - V3
#> - KA
#> - ETA_1
#> - ETA_2
#> - ETA_3
#> - IPRED
#> update:
#>   start: 0.0
#>   end: 24.0
#>   delta: 1.0
#>   add: []
#>   atol: 1.0e-08
#>   rtol: 1.0e-08
#>   ss_atol: 1.0e-08
#>   ss_rtol: 1.0e-08
#>   maxsteps: 20000.0
#>   hmax: 0.0
#>   hmin: 0.0
#>   mxhnil: 2.0
#>   ixpr: 0.0
#>   mindt: 2.22044605e-15
#>   digits: -1.0
#>   tscale: 1.0
#>   outvars:
#>   - GUT
#>   - CENT
#>   - PERIPH
#>   - CL
#>   - Q
#>   - V2
#>   - V3
#>   - KA
#>   - ETA_1
#>   - ETA_2
#>   - ETA_3
#>   - IPRED
#> set: []
#> code:
#> - $PKMODEL
#> - depot = TRUE
#> - ncmt = 2
#> - ' '
#> - $PK
#> - double CL = THETA(1)*exp(ETA(1)) * pow(THETA(6),SEX) * pow(WT/70.0,THETA(7));
#> - double V2 = THETA(2)*exp(ETA(2));
#> - double KA = THETA(3)*exp(ETA(3));
#> - double Q  = THETA(4);
#> - double V3 = THETA(5);
#> - double S2 = V2;
#> - ' '
#> - $ERROR
#> - double F = CENT/S2;
#> - double Y = F*(1+EPS(1)) + EPS(2);
#> - double IPRED = F;
#> - ' '
#> - $CAPTURE
#> - CL Q V2 V3 KA ETA(1) ETA(2) ETA(3) IPRED
#> - ' '

Read

We can read the model back from yaml format

mod <- mread_yaml("1005.yaml")
#> Building 1005_yaml_mod ... done.

mod
#> 
#> 
#> --------------  source: 1005.yaml.mod  --------------
#> 
#>   project: /private/var/fol.../T/RtmpnYp7PE
#>   shared object: 1005.yaml.mod-so-aef1758e6b06 
#> 
#>   time:          start: 0 end: 24 delta: 1
#>                  add: <none>
#> 
#>   compartments:  GUT CENT PERIPH [3]
#>   parameters:    SEX WT THETA1 THETA2 THETA3 THETA4
#>                  THETA5 THETA6 THETA7 [9]
#>   captures:      CL Q V2 V3 KA ETA_1 ETA_2 ETA_3 IPRED [9]
#>   omega:         3x3 
#>   sigma:         2x2 
#> 
#>   solver:        atol: 1e-08 rtol: 1e-08 maxsteps: 20k
#> ------------------------------------------------------

mrgsim(mod, ev(amt = 100, ID = 1:10)) %>% plot()

We can also convert from the yaml file to a mrgsolve cpp file

x <- yaml_to_cpp("1005.yaml", project = getwd())

cat(readLines("1005.yaml.mod"), sep = "\n")
#> $PROB
#> 1005 phase1 2 CMT like 1004 but diff. initial on V3
#> Run
#> file.show(system.file("nonmem", "1005", "1005.ctl", package = "mrgsolve"))
#> for equivalent NONMEM control stream.
#> 
#> $PARAM
#> SEX = 0
#> WT = 70
#> THETA1 = 9.50788555
#> THETA2 = 22.79098949
#> THETA3 = 0.07143366
#> THETA4 = 3.47450589
#> THETA5 = 113.27671224
#> THETA6 = 1.02435433
#> THETA7 = 1.19211818
#> 
#> $INIT
#> GUT = 0
#> CENT = 0
#> PERIPH = 0
#> 
#> $OMEGA
#> @block
#> @labels . . .
#> 0.21387884
#> 0.1207702
#> 0.09451047
#> -0.01162777
#> -0.03720637
#> 0.04656315
#> 
#> $SIGMA
#> @block
#> @labels . .
#> 0.04917071
#> 0
#> 0.20176885
#> 
#> $PKMODEL
#> depot = TRUE
#> ncmt = 2
#>  
#> $PK
#> double CL = THETA(1)*exp(ETA(1)) * pow(THETA(6),SEX) * pow(WT/70.0,THETA(7));
#> double V2 = THETA(2)*exp(ETA(2));
#> double KA = THETA(3)*exp(ETA(3));
#> double Q  = THETA(4);
#> double V3 = THETA(5);
#> double S2 = V2;
#>  
#> $ERROR
#> double F = CENT/S2;
#> double Y = F*(1+EPS(1)) + EPS(2);
#> double IPRED = F;
#>  
#> $CAPTURE
#> CL Q V2 V3 KA ETA(1) ETA(2) ETA(3) IPRED
#>  
#> $SET
#> start = 0
#> end = 24
#> delta = 1
#> add = numeric(0)
#> atol = 1e-08
#> rtol = 1e-08
#> ss_atol = 1e-08
#> ss_rtol = 1e-08
#> maxsteps = 20000
#> hmax = 0
#> hmin = 0
#> mxhnil = 2
#> ixpr = 0
#> mindt = 2.22044605e-15
#> digits = -1
#> tscale = 1
#> outvars = c("GUT", "CENT", "PERIPH", "CL", "Q", "V2", "V3", "KA", "ETA_1", "ETA_2", "ETA_3", "IPRED")

mread("1005.yaml.mod")
#> Building 1005_yaml_mod ... done.
#> 
#> 
#> --------------  source: 1005.yaml.mod  --------------
#> 
#>   project: /private/var/fol...0-godly-vixen
#>   shared object: 1005.yaml.mod-so-aef16fbe40b6 
#> 
#>   time:          start: 0 end: 24 delta: 1
#>                  add: <none>
#> 
#>   compartments:  GUT CENT PERIPH [3]
#>   parameters:    SEX WT THETA1 THETA2 THETA3 THETA4
#>                  THETA5 THETA6 THETA7 [9]
#>   captures:      CL Q V2 V3 KA ETA_1 ETA_2 ETA_3 IPRED [9]
#>   omega:         3x3 
#>   sigma:         2x2 
#> 
#>   solver:        atol: 1e-08 rtol: 1e-08 maxsteps: 20k
#> ------------------------------------------------------

^{Created on 2024-07-11 with reprex v2.0.2}

This looks very neat. Thanks for the detailed motivation and overview.

I've digested this only at a high level and won't get to a full review today, but I'm curious about this part:

I was originally aiming to do this in json format, but tried yaml too and yaml seemed (1) much easier to read (not sure if there is anything else I can do to improve rendering of the json) and (2) more reliable in rendering the object. So, I'm providing methods for either format.

For the "more reliable" rendering, what cases in particular tripped up the json variant?
Why offer both variants instead of just yaml?

Are there scenarios where the json variant is preferable to the yaml variant? My thinking would be that offering the json variant isn't worth the extra code and documentation unless it provides some value over the yaml variant (beyond subjective format preference). And given that your 2 indicates that it has known limitations, it seems likely that providing the json variant does more harm than good.

Thanks, @kyleam.

For the "more reliable" rendering, what cases in particular tripped up the json variant?

Maybe "reliable" isn't the right word. It was subtle differences in how R objects would get rendered in json or reconstituted when going back to R. I was surprised at how many different ways stuff could get rendered ... a little bit art. But now that seems very naive. Specifically, I think there might have been some unexpected way it handled empty objects (like lists or vectors). Maybe something with the attributes. It was all pretty minor and I was able to write workarounds to handle it and I think I've addressed, so I'm not totally sure how much of the clean up code is only devoted to getting json to work.

Why offer both variants instead of just yaml?

Good question; I don't have a great answer. I was going into this thinking json and it seem like the format all the cool kids are using. I wouldn't be against going with just yaml, but maybe a little uncertain based on what I wrote above about reliability that I'll notice something funny about the rendering down the road and wish I would have gone with the "other" one. I did try to minimize the amount of code duplication (doing all the work outside of actually reading and writing and rendering with common functions), but yeah there are more functions to call and maintain.

I'd be happy to go with just yaml for this to minimize future maintenance and review if that's what you'd advise; it should be minor effort and I could do that prior to reviewing all the code to reduce the code review burden.

KTB

Maybe "reliable" isn't the right word. [...]

Okay, thanks.

I wouldn't be against going with just yaml, but maybe a little uncertain based on what I wrote above about reliability that I'll notice something funny about the rendering down the road and wish I would have gone with the "other" one.

I see.

I did try to minimize the amount of code duplication (doing all the work outside of actually reading and writing and rendering with common functions), but yeah there are more functions to call and maintain.

True, the shared functionality is nicely separated. (I had considered mentioning that in my original comment.)

I suppose I could see that as a reason to not worry about the "wish I would have gone with the other one" thing. The underlying structure can stay the same even if you drop the json stuff, so it's easy enough to add back if needed.

I'd be happy to go with just yaml for this to minimize future maintenance and review if that's what you'd advise

No, my question wasn't considering code review. I've no problems reviewing it as is, and, as you mention, the meat is elsewhere.

If I had to choose, I'd lean toward not adding another public-facing piece unless there is a concrete reason (in particular something that makes it not redundant). But I don't have any strong objections to exposing both the json and yaml functionality, and I'll happily leave that choice to you.

@kyleam - as best as I can tell, the json stuff is gone. I updated one error message and a test. And cleaned out the documentation too.

Update to handle $CAPTURE:

Currently, I'm injecting a table of model annotations into the $PROBLEM block to save that information b/c it's getting clobbered as we recast $PARAM and $CMT blocks.

$CAPTURE can also be annotated, so this will cause a problem - model annotations will keep getting appended to the text in $PROBLEM if we keep (re)saving to yaml and read in again; it will keep getting duplicated (at least the $CAPTURE stuff will).

Rather than checking if model annotations are already inserted into $PROBLEM, I'm going to explicitly handle $CAPTURE information in the list / yaml structure. These will get written to yaml and then back into the mrgsolve control stream, getting sanitized of the annotations along the way.

Now, all potentially annotated blocks are getting clobbered and there will be no more annotations the second time we read the model (except for the annotations that we already cached in $PROBLEM).

So we can say this model transport process

Saves annotations in $PROBLEM
Drops all annotations in the code (so stuff from 1. won't be duplicated)
Has consistent handling for $PARAM, $CMT, and $CAPTURE ... they show up explicitly in the yaml / list, not in the chunk of code that is taken as-is (not parsed or explicitly handled in the list / yaml).

After some fumbling around with tests, I think this is ready to go again.

A reprex looking at how capture is captured in the yaml file

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(mrgsolve)
#> 
#> Attaching package: 'mrgsolve'
#> The following object is masked from 'package:stats':
#> 
#>     filter

Load a regular mrgsolve model

mod <- modlib("1005")
#> Loading required namespace: xml2
#> Building 1005 ... done.
mwrite_yaml(mod, "1005.yaml")
x <- readLines("1005.yaml")
cat(x[24:35], sep = "\n")
#>   PERIPH: 0.0
#> capture:
#> - CL
#> - Q
#> - V2
#> - V3
#> - KA
#> - ETA_1 = ETA(1)
#> - ETA_2 = ETA(2)
#> - ETA_3 = ETA(3)
#> - IPRED
#> omega:

^{Created on 2024-07-19 with reprex v2.1.1}

This is a bug; I will fix in a PR against this branch for (hopefully) more controlled review.

library(mrgsolve)
#> 
#> Attaching package: 'mrgsolve'
#> The following object is masked from 'package:stats':
#> 
#>     filter
code <- '
$OMEGA 1 2 3
$OMEGA 3 4 5
'
mod <- mcode("foo", code)
#> Building foo ...
#> done.
a <- mwrite_yaml(mod, file = "bar.yaml")
mod <- mread_yaml(file = "bar.yaml")
#> Error in yaml::yaml.load(text): Duplicate map key: '...'