Closed kylebaron closed 4 months ago
This looks very neat. Thanks for the detailed motivation and overview.
I've digested this only at a high level and won't get to a full review today, but I'm curious about this part:
I was originally aiming to do this in json format, but tried yaml too and yaml seemed (1) much easier to read (not sure if there is anything else I can do to improve rendering of the json) and (2) more reliable in rendering the object. So, I'm providing methods for either format.
For the "more reliable" rendering, what cases in particular tripped up the json variant?
Why offer both variants instead of just yaml?
Are there scenarios where the json variant is preferable to the yaml variant? My thinking would be that offering the json variant isn't worth the extra code and documentation unless it provides some value over the yaml variant (beyond subjective format preference). And given that your 2 indicates that it has known limitations, it seems likely that providing the json variant does more harm than good.
Thanks, @kyleam.
For the "more reliable" rendering, what cases in particular tripped up the json variant?
Maybe "reliable" isn't the right word. It was subtle differences in how R objects would get rendered in json or reconstituted when going back to R. I was surprised at how many different ways stuff could get rendered ... a little bit art. But now that seems very naive. Specifically, I think there might have been some unexpected way it handled empty objects (like lists or vectors). Maybe something with the attributes. It was all pretty minor and I was able to write workarounds to handle it and I think I've addressed, so I'm not totally sure how much of the clean up code is only devoted to getting json to work.
Why offer both variants instead of just yaml?
Good question; I don't have a great answer. I was going into this thinking json and it seem like the format all the cool kids are using. I wouldn't be against going with just yaml, but maybe a little uncertain based on what I wrote above about reliability that I'll notice something funny about the rendering down the road and wish I would have gone with the "other" one. I did try to minimize the amount of code duplication (doing all the work outside of actually reading and writing and rendering with common functions), but yeah there are more functions to call and maintain.
I'd be happy to go with just yaml for this to minimize future maintenance and review if that's what you'd advise; it should be minor effort and I could do that prior to reviewing all the code to reduce the code review burden.
KTB
Maybe "reliable" isn't the right word. [...]
Okay, thanks.
I wouldn't be against going with just yaml, but maybe a little uncertain based on what I wrote above about reliability that I'll notice something funny about the rendering down the road and wish I would have gone with the "other" one.
I see.
I did try to minimize the amount of code duplication (doing all the work outside of actually reading and writing and rendering with common functions), but yeah there are more functions to call and maintain.
True, the shared functionality is nicely separated. (I had considered mentioning that in my original comment.)
I suppose I could see that as a reason to not worry about the "wish I would have gone with the other one" thing. The underlying structure can stay the same even if you drop the json stuff, so it's easy enough to add back if needed.
I'd be happy to go with just yaml for this to minimize future maintenance and review if that's what you'd advise
No, my question wasn't considering code review. I've no problems reviewing it as is, and, as you mention, the meat is elsewhere.
If I had to choose, I'd lean toward not adding another public-facing piece unless there is a concrete reason (in particular something that makes it not redundant). But I don't have any strong objections to exposing both the json and yaml functionality, and I'll happily leave that choice to you.
@kyleam - as best as I can tell, the json stuff is gone. I updated one error message and a test. And cleaned out the documentation too.
Update to handle $CAPTURE
:
Currently, I'm injecting a table of model annotations into the $PROBLEM
block to save that information b/c it's getting clobbered as we recast $PARAM and $CMT
blocks.
$CAPTURE
can also be annotated, so this will cause a problem - model annotations will keep getting appended to the text in $PROBLEM
if we keep (re)saving to yaml and read in again; it will keep getting duplicated (at least the $CAPTURE
stuff will).
Rather than checking if model annotations are already inserted into $PROBLEM
, I'm going to explicitly handle $CAPTURE
information in the list / yaml structure. These will get written to yaml and then back into the mrgsolve control stream, getting sanitized of the annotations along the way.
Now, all potentially annotated blocks are getting clobbered and there will be no more annotations the second time we read the model (except for the annotations that we already cached in $PROBLEM
).
So we can say this model transport process
$PROBLEM
$PARAM
, $CMT
, and $CAPTURE
... they show up explicitly in the yaml / list, not in the chunk of code that is taken as-is (not parsed or explicitly handled in the list / yaml). After some fumbling around with tests, I think this is ready to go again.
A reprex looking at how capture is captured in the yaml file
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(mrgsolve)
#>
#> Attaching package: 'mrgsolve'
#> The following object is masked from 'package:stats':
#>
#> filter
Load a regular mrgsolve model
mod <- modlib("1005")
#> Loading required namespace: xml2
#> Building 1005 ... done.
mwrite_yaml(mod, "1005.yaml")
x <- readLines("1005.yaml")
cat(x[24:35], sep = "\n")
#> PERIPH: 0.0
#> capture:
#> - CL
#> - Q
#> - V2
#> - V3
#> - KA
#> - ETA_1 = ETA(1)
#> - ETA_2 = ETA(2)
#> - ETA_3 = ETA(3)
#> - IPRED
#> omega:
Created on 2024-07-19 with reprex v2.1.1
This is a bug; I will fix in a PR against this branch for (hopefully) more controlled review.
library(mrgsolve)
#>
#> Attaching package: 'mrgsolve'
#> The following object is masked from 'package:stats':
#>
#> filter
code <- '
$OMEGA 1 2 3
$OMEGA 3 4 5
'
mod <- mcode("foo", code)
#> Building foo ...
#> done.
a <- mwrite_yaml(mod, file = "bar.yaml")
mod <- mread_yaml(file = "bar.yaml")
#> Error in yaml::yaml.load(text): Duplicate map key: '...'
Created on 2024-07-19 with reprex v2.1.1
Summary
Background: I've been wondering about something like this or a while so it would be easier to share models that were dependent on NONMEM outputs. Then Richard asked about this in #1177 and we got motivated to really try to push this out.
Convert an mrgsolve model object to a "transport" file format. I was originally aiming to do this in json format, but tried yaml too and yaml seemed (1) much easier to read (not sure if there is anything else I can do to improve rendering of the json) and (2) more reliable in rendering the object. So, I'm providing methods for either format.
The main feature here is that it frees you from requiring NONMEM output when reading in the model file. But you can also see almost everything in the model object, including, for example, parameters or other settings that have been updated. And you could read in this transport format, programmatically make changes to a parameter or setting, then write back. You can now and unlikely you'll ever be able to re-write actual model code (not what this is intended for).
mwrite_yaml()
andmwrite_json()
mread_yaml()
andmread_json()
; these convert the yaml (or json) to mrgsolve model code, compile, and loadyaml_to_cpp()
andjson_to_cpp()
)EDIT: removed all json functionality on code review; may revisit at another time if there is some need that can be met with json and not yaml.
The transport format tracks
source
- the transport code is rejected if this is notmrgsolve::mwrite
mrgsolve
version number (not currently used)version
(not currently used)Digits
Writers for both yaml and json accept a
digits
argument; this is defaulting to 8 right now.PKMODEL
We do some minor gymnastics with the
PKMODEL
block. This is because you can declare compartments in$PKMODEL
and we need to keepPKMODEL
in the model spec, but need to remove the compartment declaration since we are taking compartments from the modelinit()
.SET
Also some work to coordinate the
$SET
block. There are a bunch of slots in the model object that we'd like to carry into the transport file and then bring back when we load the model from transport. When callingmread_yaml()
, we do that update from a block of settings in the transport file (underupdate
). But I also wanted to be able to write all that into the file. So, on write, we parse out$SET
and track that in the transport file. When requested, we can append to that model code with theseupdate
items.CAPTURE
Late decision to explicitly handle capture items in the list / yaml. This means
$CAPTURE
isn't going to appear in the generalcode
item, but rather in acapture
item, along the lines of$PARAM
and$CMT
. As far as I can tell, this means that no annotations will appear in the model after it passes through yaml / list.Example
Load a regular mrgsolve model
Write
We can write it out in
yaml
formatThe yaml is easier to read
Read
We can read the model back from yaml format
We can also convert from the yaml file to a mrgsolve cpp file
Created on 2024-07-11 with reprex v2.0.2