metrumresearchgroup / bbr

R interface for model and project management
https://metrumresearchgroup.github.io/bbr/
Other
21 stars 2 forks source link

No error or warning when trying to overwrite run when `.wait = FALSE` #691

Open kylebaron opened 1 month ago

kylebaron commented 1 month ago

Is this expected / documented? The simulation run was already executed. When I try to resubmit with .wait = FALSE, I don't get a warning or error back. If I set .wait = TRUE, then I get the error.

>   submit_model(sim_mod, .wait = FALSE, .mode = "local")
Running:
  /data/home/kyleb/project.mrg/current/bin/bbi nonmem run local /data/home/kyleb/project.mrg/current/models/pk/106-sim.ctl --threads=4 --parallel
In /data/home/kyleb/project.mrg/current/models/pk
Not waiting for process to finish.
>   submit_model(sim_mod, .wait = TRUE, .mode = "local")
Error in check_status_code(p$get_exit_status(), output, .cmd_args) :
The target output directory already exists. Please pass `.bbi_args=list(overwrite=TRUE)` to your `submit_model()` call. You can also set `overwrite: true` in the model .yaml file or the `bbi.yaml` file.
kyleam commented 1 month ago

I think it's expected in the sense that when .wait = FALSE we don't wait for the bbi subprocess to exit, and bbi is the one that errors when --overwrite isn't passed but the model directory exists. I don't know if it's spelled out anywhere in the docs that you won't see this and other bbi errors when .wait = FALSE (the same applies to most bbi errors when .mode = "sge" even when .wait = TRUE).

The main options I can think of:

My initial preference would be the second option. Thoughts?

kylebaron commented 1 month ago

That makes sense why it's happening. In my case, I'm running simulations with .mode = "local" and not waiting. This will be what we're doing most of the time for the vpc simulations. I fired off the run, but nothing happened and I wasn't sure why. It took a bunch of this before I realized the pattern. As I user, I'd prefer some pointer about what's going, especially since .overwrite = FALSE by default. But maybe that's more involved since that bbi.yaml file would have to be consulted first.

seth127 commented 1 month ago

I'll loop back to this later, but on first look, I think this is the preferable user behavior:

. make bbr check whether the output directory exists and overwrite isn't specified in any way (including through bbi.yaml default) as a special/common case and give a more useful error

I'm not seeing an obvious reason we couldn't implement that, and it certainly makes sense (from a user perspective) to check the .overwrite case before actually "submitting".

seth127 commented 1 month ago

I'll also add that this doesn't seem like a bad idea either

adjust bbi_exec() to wait a short bit (say 10 ms) even when .wait = FALSE to see if there are any quick errors from bbi, and then do its standard check_status_code routine to propagate the error

even unrelated to this specific issue.

kyleam commented 1 month ago

I'll also add that this doesn't seem like a bad idea either

If we go that way, I don't see much reason in practice to go through the complication of repeating bbi's logic for the check in bbr. That gets you the error in most cases (including sge execution because bbi does the existence check upfront), with a small risk of missing it in cases where the bbi failure takes longer than the chosen time interval.

seth127 commented 1 month ago

Ah, that's interesting.

including sge execution because bbi does the existence check upfront

this ^ was the point I was missing there. This might be the right move, although I worry a little about race conditions.

kyleam commented 1 month ago

although I worry a little about race conditions.

Right, there certainly is a race condition, and my "in practice" is doing a lot of heavy lifting. When I said "small risk of missing it", I might be guessing wrong about 1) the likelihood of bbi not failing within the chosen window and 2) the potential fallout from not signaling an error (e.g., some future script that expects and handles the error).

Anyway, at this point, I don't have a strong objection to the explicit check on bbr's end (option 3), and I'm not really convinced that the small wait is a good idea (option 4) (though my guess is that in practice it would be good enough and it has the advantage of not needing logic on bbr's side and catching any other early errors too).

For option 3, I don't like the idea of repeating bbi logic, but I understand the desire to get this error in all cases. It's a change in behavior that I imagine could break some existing scripts (e.g., something that expects the actual submit_model(..., .wait = FALSE) call to succeed but then uses the returned process to wait around and check that status), but I doubt submit_model is actually being used this way.

One last comment: bbi's "output exists?" check is more complicated than just "does the directory exist?". I do not think we should try to repeat any of that on bbr's end and should just stick to "error if output directory exists and overwrite wouldn't be in effect".

seth127 commented 1 month ago

That all makes sense. One somewhat simple option could be to do a "lightweight" version of your option 3 where we check for the output directory, but only compare it an overwrite = TRUE that bbr already knows about. This would be:

I think the only case this misses is when the user has changed the default in the model directory's bbi.yaml file. It's a somewhat odd and inconsistent behavior for this to not be caught by this check, but it doesn't bother me too much. Especially given that this is probably the least likely case, in practice. And because we're not making it any worse for the user, and are definitely making the other cases better.

Any thoughts on that?

kyleam commented 1 month ago

I think you're probably right that users set overwrite: true in bbi.yaml much less often than setting it via other methods, but for those that do use bbi.yaml...

And because we're not making it any worse for the user, and are definitely making the other cases better.

... I think they'd find it worse. They were setting overwrite: true in bbi.yaml and bbr::submit_model used to do as they wanted. Now it would abort instead of overwriting.

This guard would probably be added somewhere in submit_nonmem_model():

https://github.com/metrumresearchgroup/bbr/blob/88deecd4fb78b927cabf6931997e4ae157b18323/R/submit-model.R#L118-L142

We already have the bbi.yaml path at that point, so if we don't see overwrite in effect due to the .overwrite argument or in the merged .bbi_args, I think we should at least make some effort (more on that below) to match the overwrite setting bbi would see. For some quick testing I did

diff --git a/R/submit-model.R b/R/submit-model.R
index a9c04256..ae855311 100644
--- a/R/submit-model.R
+++ b/R/submit-model.R
@@ -127,13 +127,19 @@ submit_nonmem_model <- function(.mod,

   # define working directory
   model_dir <- get_model_working_directory(.mod)
-
-  .path_exists <- file_exists(.config_path %||% file.path(model_dir, "bbi.yaml"))
-  if(!.path_exists){
+  cpath <- .config_path %||% file.path(model_dir, "bbi.yaml")
+  if(!file_exists(cpath)){
     stop(paste("No bbi configuration was found in the execution directory.",
                "Please run `bbi_init()` with the appropriate directory to continue."))
   }

+  # TODO: Explain cases where there can be a mismatch.
+  bbi_will_overwrite <- isTRUE(.bbi_args[["overwrite"]]) ||
+    isTRUE(yaml::read_yaml(cpath)[["overwrite"]])
+  if (!bbi_will_overwrite && file_exists(get_output_dir(.mod, .check_exists = FALSE))) {
+    stop("boo")
+  }
+
   if (!is.null(.config_path)) {
     cmd_args <- c(
       cmd_args,

I think that does an okay job of matching whether bbi considers overwrite to be in effect. There are some cases it wouldn't match for due to some differences in yaml parsing (e.g., I think any non-zero integer in bbi.yaml would be treated as true by bbi). Also bbi is okay with an uppercase Overwrite. (Then there are the extra details of what bbi checks aside from just existence, but again I don't think we should follow it there.) But it's probably good enough.

Also, as I was playing around with that, I was a bit surprised that we can't override an overwrite: true from bbi.yaml by passing .overwrite = FALSE (or the equivalent via bbi_args or the model yaml). That makes sense, given that .overwrite = FALSE maps to "no argument" (rather than say --no-overwrite), so there is no way to distinguish between unspecified and .overwrite = FALSE.

seth127 commented 1 month ago

This is still a work-in-progress, but my first shot at implementing this is in https://github.com/metrumresearchgroup/bbr/pull/692/commits/742e75f364a7a4b250126df744bc9d61bf016f65

Incidentally, I think it also addresses this point from @kyleam :

I was a bit surprised that we can't override an overwrite: true from bbi.yaml by passing .overwrite = FALSE (or the equivalent via bbi_args or the model yaml)

Because we check the args first and then would error in the case where they are explicitly FALSE and a directory exists. This would essentially stop things before it even got to bbi to trigger the "no argument" issue. (To be clear, I don't think solving that case is must have for this implementation, but if I'm not confused, I think we covered it anyway.)

kyleam commented 1 month ago

Incidentally, I think it also addresses this point from @kyleam :

Yeah, it looks like your PR takes any non-NULL .bbi_args overwrite value precedence (versus just given TRUE precedence) so it does stop in that case. That may be the way to go, though I'm not sure since it now makes the handling inconsistent with all other logical values in .bbi_args.

seth127 commented 1 month ago

That may be the way to go, though I'm not sure since it now makes the handling inconsistent with all other logical values in .bbi_args.

I guess I don't have a strong opinion on this. Per your original point, I think most users would find it "surprising" that "there is no way to distinguish between unspecified and .overwrite = FALSE." I also see this as somewhat of a special case, in that we have an explicit bbr argument for .overwrite too, and for that we do distinguish between unspecified and FALSE (or at least, that was our intention).

That said, if you think about this and feel strongly that we should keep it consistent with other .bbi_args, I'm fine with making that change.

kyleam commented 1 month ago

also see this as somewhat of a special case, in that we have an explicit bbr argument for .overwrite too [...]

Yeah.

That said, if you think about this and feel strongly that we should keep it consistent with other .bbi_args, I'm fine with making that change.

I'd say go with your change. The inconsistency in handling compared to other bbi flags is making me pause (as well as the desire to make the minimal number of behavioral changes), but I bet the change leads to less confusion than more overall.

barrettk commented 1 month ago

This was addressed for the submit_model methods in https://github.com/metrumresearchgroup/bbr/pull/692, however submit_models() (plural) is still an open question; mainly how we want to handle .bbi_args()

I suggest we keep similar handling, and perhaps check that the line isTRUE(yaml::read_yaml(cpath)[["overwrite"]]) is TRUE for all models in .mods (see ref).

seth127 commented 1 month ago

Thanks for posting that detail @barrettk . Looking at it quickly, the main issue that complicates things is that submit_models() allows different models to be carrying around different bbi_args and then reconciles the unique combinations or args into distinct bbi calls (code here, called here).

Without giving it much more thought, I'm not sure how we want to handle that. I think ideally we would decide on this and implement before the next release (which will have the #692 changes in it). That said, if remains as is for now, I'm still fine releasing those #692 changes, unless someone else objects.