Bootstrap batch submission sometimes chokes when called before setup is done

seth127 commented 1 month ago

We need to look into this further, but I've noticed some strange behavior, when I run code this back-to-back in the console:

MODEL_DIR <- here("model", "pk")
orig_mod <- read_model(here(MODEL_DIR,"106"))
boot_run <- new_bootstrap_run(orig_mod)
boot_run <- setup_bootstrap_run(boot_run, n = 200, .overwrite = T)
submit_model(boot_run)

out_path <- file.path(get_output_dir(boot_run), "OUTPUT") 
if (fs::file_exists(out_path)) {
  file.edit(out_path) # preview in RStudio
}

In most cases, this works fine, as expected. However, sometimes when I run the submit_model() call _while setup_bootstrap_run() is still executing_ I have this problem where the batch submission seems to choke.

I've seen the OUTPUT file be empty, and I've also seen something like the following be written into it:

Error in base::as.environment("tools:callr") : 
  no item called "tools:callr" on the search list
In addition: Warning message:
In gzfile(file, "rb") :
  cannot open compressed file '/tmp/Rtmp7ZqGas/callr-fun-25ad501cf198', probable reason 'No such file or directory'

In all of these cases, none of the models actually get submitted.

I'm assuming there is some race condition happening here. It may also be an issue with setup_bootstrap_run() exiting before it's actually done with it's work, though I don't quite understand how that could happen.

I'll also note that I've never seen this happen when I wait for each line to finish before running the next line. Anecdotal evidence, at best, but may be worth considering

barrettk commented 3 weeks ago

I've done some testing here and havent been able to reproduce yet. I wanted to add though that the bootstrap specification file gets created after the models are made and the datasets are saved out:

Relevant `setup_bootstrap_run` snippet

```r setup_bootstrap_run <- function( .boot_run, n = 200, strat_cols = NULL, seed = 1234, .overwrite = FALSE ){ ... rest of `setup_bootstrap_run` call... # Create model object per boot run if(!is.null(seed)) withr::local_seed(seed) boot_models <- purrr::map(mod_paths, make_boot_run, boot_args) make_boot_spec(boot_models, boot_args) # Garbage collect - may help after handling many (potentially large) datasets # - It can be useful to call gc() after a large object has been removed, as # this may prompt R to return memory to the operating system. gc() }else{ rlang::abort( c( glue("Bootstrap run has already been set up at `{boot_dir}`"), "pass `.overwrite = TRUE` to overwrite" ) ) } return(invisible(.boot_run)) } ```

Additionally, the submit_model call will fail if the spec file hasnt been made yet. If this ever happens again, I would kill the process and zip up the current state of the bootstrap model for closer inspection, because Im struggling to see where this issue could be introduced. I haven't ruled out gc() being a contributor, though I dont see how it would mess up the state of things either.

seth127 commented 3 days ago

Thanks for looking into this. I'll let you know if I ever see this behavior again. Good idea about zipping up the model directory for later inspection.

metrumresearchgroup / bbr

Bootstrap batch submission sometimes chokes when called before setup is done #697