Open barrettk opened 1 year ago
@seth127
modelfit_dir
's
NM_run
's
execute
(i.e. bootstrap
, scm
, etc.)meta.yaml
was wrong. It said I had 5 threads, when A) I never specified any (should be 1), and B) I only had 4 cores for my workflow. However, its possible this is the default value, and it wasn't actually used since it wasn't parallelized, which brings me to the next point...nonmem
chunk of the bbi json may not be good enoughmodel_summary()
and other functions to run, or do we want PsN generated RMD
's to still run? Are there other files that aren't required, but the modeler would want to keep?Good start here. I'll leave all the bbi_config.json
stuff alone for now, though we definitely need to get back to it. Going to concentrate on the interface and how/where we get the files that we're copying.
Assuming there is only one mod file in the directory won't work, because people often have many different runs in the same dir. Consider this interface instead:
convert_psn(
.modelfit_dir, # no default, this will be the path to a modelfit_dir
.bbr_dir = dirname(.modelfit_dir), # default to the same dir that modelfit_dir is in
.psn_model_file = NULL
)
To find the original control stream, if .psn_model_file = NULL
(default) look for the run name in {.modelfit_dir}/command.txt
and look for the file in dirname(.modelfit_dir)
. If we don't find that file, error and ask them to pass a path to .psn_model_file
.
If we do find it, copy it over (but don't rename it). We need to look into the $DATA
issue, but consider if we want to look for the file at $DATA
and warn if we can't find it. (more on this below)
NM_run
dir (like you're doing)NM_run/psn.*
(i.e. don't try to copy stuff from top-level renamed files)bbi
assumes the path in $DATA
is relative to a directory one level deeper than the control stream file, because this is where it actually executes the model. However, for .mod
files it does not assumes this and will actually add an ../
to the path in the control stream that gets copied through to the output/execution directory. This was based on a PsN-related assumption. Details in code:
copyFileToDestination()
function call that passes strings.ToLower(l.Extension) == "mod"
to modifyPath
arg. (In other words: if extension is .mod
, then modifyPath = TRUE
)modify_path
control flow that calls AddPathLevelToData()
.AddPathLevelToData()
code that adds an extra ../
.mod
control stream, we don't change the$DATA
path in the original .mod
file, but we do need to add ../
to the $DATA
path in the .mod
that is copied into the output/execution dir.
bbi
would've done if it ran this model.bbr_dir
is not dirname(.modelfit_dir)
(aka the proposed default), then we should probably modify the data path to be relative to the new location of the control stream. This might take some fs::path_rel()
gymnastics.{path we parsed}
but no file is there. Please adjust $DATA
in {path to new control stream}
."We also need to but the (modified) data path into data_path:
in bbi_config.json
. One reason for this is that get_data_path()
will look for it there (and this called by things like nm_join()
and friends).
I know I said we'd talk more about bbi_config.json
later, but while we're here, let's make data_md5
and model_md5
something like "PRERUN"
. We're going to want those to be an unequivocal string that we can use to short-circuit other helpers that try to use them.
We also need to but the (modified) data path into
data_path:
inbbi_config.json
. One reason for this is thatget_data_path()
will look for it there (and this called by things likenm_join()
and friends).I know I said we'd talk more about
bbi_config.json
later, but while we're here, let's makedata_md5
andmodel_md5
something like"PRERUN"
. We're going to want those to be an unequivocal string that we can use to short-circuit other helpers that try to use them.
Just FYI this is already done in convert_psn_data_path
. I've tested that already and that seems to concistently deliver the right path, depending where the new bbr dir was set to. The "prerun" aspect hasn't though. Might need some more context for that
Using new method:
psn_dir <- system.file(file.path("model", "nonmem", "psn"), package = "bbr", mustWork = TRUE)
psn_dir <- fs::path_rel(psn_dir, getwd())
psn_type <- "mod_table" # mod_table, mod, or ctl
.modelfit_dir <- file.path(psn_dir, psn_type, "modelfit_dir1")
.bbr_dir <- file.path(psn_dir, psn_type, "bbr_test")
.psn_model_file <- file.path(psn_dir, psn_type, "run105_table.mod")
mod <- convert_psn(.modelfit_dir = .modelfit_dir,
.bbr_dir = .bbr_dir,
.overwrite = TRUE)
mod <- convert_psn(.modelfit_dir = .modelfit_dir,
.psn_model_file = .psn_model_file,
.bbr_dir = .bbr_dir,
.overwrite = TRUE)
Just realized I cant take the max of a folder (like I was doing for NM_run
directories) because they use literal sorting rather than numeric:
psn_fit_dirs <- psn_fit_dirs[grepl("modelfit", psn_fit_dirs)] %>% sort()
> psn_fit_dirs
[1] "/data/pirana_examples/modelfit_dir1" "/data/pirana_examples/modelfit_dir10" "/data/pirana_examples/modelfit_dir11"
[4] "/data/pirana_examples/modelfit_dir12" "/data/pirana_examples/modelfit_dir13" "/data/pirana_examples/modelfit_dir14"
[7] "/data/pirana_examples/modelfit_dir2" "/data/pirana_examples/modelfit_dir3" "/data/pirana_examples/modelfit_dir4"
[10] "/data/pirana_examples/modelfit_dir5" "/data/pirana_examples/modelfit_dir6" "/data/pirana_examples/modelfit_dir7"
[13] "/data/pirana_examples/modelfit_dir8" "/data/pirana_examples/modelfit_dir9"
> max(psn_fit_dirs)
[1] "/data/pirana_examples/modelfit_dir9"
if(length(.psn_run_dir) > 1){
.psn_run_dir <- max(.psn_run_dir))]
}
if(length(.psn_run_dir) > 1){
.psn_run_dir <- .psn_run_dir[which.max(readr::parse_number(basename(.psn_run_dir)))]
}
@seth127 it seems the pirana examples dont have a meta.yaml
file, which was introduced in PsN 4.8.0
Given that we're on 5.2.6, we must have executed these models a long while back. Id really prefer not to remove the meta.yaml
dependency given all the submission info it captures, but open to suggestions. Will see what other options we have.
I found additional text files that may contain what we need, but will have to write a helper function to parse the text file or yaml. Not sure if I should check for the presence of a yaml file, or check the installed PsN version (psn -version
) to decide what to look for. As an aside, the text file doesnt contain all the same information, so not 100% sure if this will work
The main difference im seeing is that the yaml has the location of the model file used during the execution call, while the text file does not:
model_files:
- /data/Projects/package_dev/bbr/inst/model/nonmem/psn/mod_table/run105_table.mod
This is unfortunate, as I haven't found a way to map back to the model used without the yaml file. It seems like ill have to use the logic:
.psn_model_file
is NULL
we can grab the model file used (unformatted/messed up model file).psn_model_file
is NULL
, we have to use the model file in the NM_run
folder, which is formatted to have less white space and is less readable@kyleam have you seen anything like the most recent drone failure in this PR? Seems like it failed during cloning with a bizarre message. Not the first time i've seen this.
@barrettk Hmm, that is a bizarre failure. From drone:
+ git fetch origin +refs/heads/main:
From https://github.com/metrumresearchgroup/bbr
* branch main -> FETCH_HEAD
* [new branch] main -> origin/main
+ git checkout main
Branch 'main' set up to track remote branch 'main' from 'origin'.
Switched to a new branch 'main'
+ git fetch origin refs/pull/568/head:
From https://github.com/metrumresearchgroup/bbr
* branch refs/pull/568/head -> FETCH_HEAD
+ git merge d4f5b5ef1d498904dc19886141d065ce88634bb4
merge: d4f5b5ef1d498904dc19886141d065ce88634bb4 - not something we can merge
Nothing there looks strange, aside from the failure message of course, and the commit is something it can merge:
$ git checkout origin/main
Previous HEAD position was d4f5b5ef Bug fix: now properly finds the latest NM_run folder
HEAD is now at 89f9f930 Merge pull request #567 from metrumresearchgroup/release/1.5.0
$ git fetch origin refs/pull/568/head:
From github.com:metrumresearchgroup/bbr
* branch refs/pull/568/head -> FETCH_HEAD
$ git rev-parse FETCH_HEAD
d4f5b5ef1d498904dc19886141d065ce88634bb4
$ git merge FETCH_HEAD
Updating 89f9f930..d4f5b5ef
Fast-forward
NAMESPACE | 1 +
R/convert-psn.R | 314 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
man/convert_psn.Rd | 50 ++++++++++++++
man/convert_psn_data_path.Rd | 17 +++++
man/copy_psn_run_files.Rd | 27 ++++++++
man/copy_psn_tables.Rd | 24 +++++++
man/create_psn_json.Rd | 17 +++++
7 files changed, 450 insertions(+)
create mode 100644 R/convert-psn.R
create mode 100644 man/convert_psn.Rd
create mode 100644 man/convert_psn_data_path.Rd
create mode 100644 man/copy_psn_run_files.Rd
create mode 100644 man/copy_psn_tables.Rd
create mode 100644 man/create_psn_json.Rd
$ git log --oneline origin/main..d4f5b5ef1d498904dc19886141d065ce88634bb4
d4f5b5ef (HEAD, origin/feat/PsN_output_helper, refs/pull/origin/568) Bug fix: now properly finds the latest NM_run folder
6668585b refactored conversion to use better arguments
aedeeba5 updated .mod_path variable definition for several helper functions
abfee5c8 added clarification comment
d3af42a0 incomplete comment
9160b12d PsN converter is now functional (with a lot of assumptions)
e1a71fc5 Psn output conversion (WIP)
Does it persist if you restart?
Does it persist if you restart?
Restart my R session? I dont have any issues locally. I bet my next commit won't run into the drone issue too (just a hunch), but I dont see what restarting would do. Are you suggesting restarting before my next commit?
Restart my R session?
The drone build, via the button on the right:
FYI @seth127
I discovered we will likely need additional handling for model and data paths in the event the model was executed on a different system, and then just copied over. This causes the meta.yaml
and/or version_and_option_info.txt
file to reference file paths that do not exist on the system.
I added a bunch of handling/guessing for model paths, but data paths have not been handled yet. The pirana examples wont work without allowing users to specify a datapath as well (or we could just error out).
At the moment, if no model file is specified, we will do one of two things:
modelfit_dir
), and check to see if the model file is there. If not, error out and prompt the user to specify a path. We could grab the psn.mod
, but that has the aforementioned formatting issues.To handle datapath discrepancies however, we only have the option of either A) erroring out if we cant find it, or B) allowing the user to specify a datapath (would only ever be used for models run on a different machine). In all honesty I dont like either solution, but would lean towards option B. If we chose option A), all the model_path handling I just implemented would pointless, since we'd error out as soon as we recognized some file paths didnt exist.
This is all looking good @barrettk. Overall, I think we should make a few more tweaks and then reach out to some users to test is out and give feedback. Discussion on those below:
I added a bunch of handling/guessing for model paths...
This sounds good. I think we want to make best effort to find the original model file, but if we can't then we just need to error and tell the user to pass it in via . psn_model_file
. I don't think we want to grab the badly formatted one in the run dir. If they really want to use that, then they can pass that path manually.
data paths have not been handled yet... we have the option of either A) erroring out if we cant find it, or B) allowing the user to specify a datapath
I think we should basically do a two-step thing:
$DATA
from that and then search for the data file from there. That path will be relative to the location of the model file that we just found. If we can't find it, let's give a warning saying "expected to find data at {path we parsed} but no file is there. Please adjust $DATA in {path to new control stream}."
I don't think we actually need to error (though we'll want to get user feedback on this)One more consideration, copied from above, when dealing with the data path:
- If
.bbr_dir
is notdirname(.modelfit_dir)
(aka the proposed default), then we should probably modify the data path to be relative to the new location of the control stream. This might take somefs::path_rel()
gymnastics.
Basically we want to be sure that, if we found the data file, the new copied .mod
file still has the right path to it.
.mod
file in execution dirOne last note... I had written this above:
When copying the .mod control stream, we don't change the
$DATA
path in the original.mod
file, but we do need to add../
to the$DATA
path in the.mod
that is copied into the output/execution dir. This is to imitate what bbi would've done if it ran this model.
but now I'm thinking... do we even need to copy in the .mod
file to the execution directory? Do we use that for anything, once execution is finished? I'm thinking we might want to try not copying it and see if anything breaks. It just seems... unnecessary and confusing. (To be clear, I know we need this, I'm talking about getting rid of this.)
One last note... I had written this above:
When copying the .mod control stream, we don't change the
$DATA
path in the original.mod
file, but we do need to add../
to the$DATA
path in the.mod
that is copied into the output/execution dir. This is to imitate what bbi would've done if it ran this model.but now I'm thinking... do we even need to copy in the
.mod
file to the execution directory? Do we use that for anything, once execution is finished? I'm thinking we might want to try not copying it and see if anything breaks. It just seems... unnecessary and confusing. (To be clear, I know we need this, I'm talking about getting rid of this.)
To my knowledge, no we dont need it (will confirm), but I mostly did that because previous bbr runs have multiple copies of the model, and I figured we would want the execution to look as similar as possible. Will confirm the functional aspect, but what are your thoughts on keeping it just for consistency's sake?
what are your thoughts on keeping it just for consistency's sake?
At this point I'm thinking we should not copy it, unless we need it, for the following reasons:
bbr
, but in terms of consistency, I think it's probably beneficial if it's obvious that this actually wasn't run with bbr
, and so some of the reproducibility promises aren't the same. In that sense, some inconsistency might be semi-desirable, because it just makes that more obvious to future users.bbi
executes. Similar to previous comment: missing it just shows that it wasn't run with bbi
.
- This copied control stream is really just an artifact of how
bbi
executes. Similar to previous comment: missing it just shows that it wasn't run withbbi
.
I definitely agree with it being desirable that there are obvious differences. Main reason Im nervous of downstream effects is because we had talked about how .mod
vs .ctl
extensions have slightly different file paths, where one points to the run directory and the other points to the original model. Ill have to look further into that to make sure the constructed data paths are still correct (since they're relative paths), as that would matter if the user chose to re-run the model with bbr (though not sure how likely that is)
Testing this and it's looking good. Some small changes.
.data_path
in the bbi_config.json
should be relative or else get_data_path()
breaks. This line should be data_path = fs::path_rel(.data_path, .bbr_run_dir),
.tab
to the table files. Copy them as is..bbi_dir
a required argumentOnce that's done, let's
1.5.0.8002
This looks good. As stated previously, we're not going to merge this now because we want to get some user feedback (and write some formal tests, etc.).
Keeping this PR open for now, while we collect user feedback on this function...
Example
Can be tested using the code below. However at this point in time, the new PsN models/data have not been committed to this branch. In the end we should be able to use
processx
to execute a PsN call of an existingbbr
model, though I think we will likely want to add additional models to ensure our testing is comprehensive (there's a lot more variability in PsN runs)If anyone wants to test this with existing bbr models, copy a model to a new location, navigate to the
ctl
file in your terminal, and then runexecute <mod_name>
. After which, you should be able to identify which folders correspond to the example below.