metrumresearchgroup / bbr

R interface for model and project management
https://metrumresearchgroup.github.io/bbr/
Other
21 stars 2 forks source link

Support bootstrap runs #671

Closed barrettk closed 3 weeks ago

barrettk commented 3 months ago

New Bootstrap functionality

The functions below do not capture all new functions or supported S3 methods, but rather highlight key functions that will be helpful to users and/or future developers.

New exported functions:

closes https://github.com/metrumresearchgroup/bbr/issues/682

barrettk commented 2 months ago

See message in https://github.com/metrumresearchgroup/bbr/pull/671/commits/4e4d6afa0ab3ca91c6c2e86ab1ee1f09edaf499a for context, but just wanted to add some more thoughts:

The order would look like this:

mod_boot <- copy_model_as_bootstrap(
    MOD1, n = 10, .overwrite = TRUE
)
submit_model(mod_boot)
wait_for_nonmem(mod_boot)

# summarize model files for quick loading and consolidate if desired
summarize_bootstrap(mod_boot, cleanup = TRUE)

# summarize model parameters
model_summary(mod_boot)

As you can see, this solution isn’t the most elegant. The function name summarize_bootstrap would probably be confusing, though I can’t think of a better name at the moment.

barrettk commented 2 months ago

FYI Im refactoring the wait_for_nonmem function a bit. Im making the underlying check_nonmem_finished use S3 (was set up to, but only had one method), which negates the need for wait_for_nonmem to. Given that bbr.bayes plans to use this (but hasnt yet from what I can tell - only mention I saw), I opted to make wait_for_nonmem just have a default class, which supports bbi models, a list of models, and now bootstrap models. The code reads a lot better with these (uncommitted) changes in my opinion as well.

These changes should not impact bbr.bayes, but just mentioning it ahead of time.

cc @seth127 @kyleam

kyleam commented 2 months ago

Thanks for the heads up @barrettk. I read your comment a few times and am not really following it, but I'll leave some comments/clarifications in case they're helpful.

FYI Im refactoring the wait_for_nonmem function a bit. Im making the underlying check_nonmem_finished use S3 (was set up to, but only had one method), which negates the need for wait_for_nonmem to.

I'm confused what "use S3" means given, as you mention, it's already an S3 method. It was converted to an S3 method with 84ee34b9 (check_nonmem_finished: convert to S3 method, 2024-02-14) so that bbr.bayes could implement a custom method.

Given that bbr.bayes plans to use this (but hasnt yet from what I can tell - only mention I saw), I opted to make wait_for_nonmem just have a default class, which supports bbi models, a list of models, and now bootstrap models.

I'm not aware of any plans to implement a custom wait_for_nonmem method for bbr.bayes. As I mention in this comment, adding the custom check_nonmem_finished() method was all that was needed to make wait_for_nonmem() compatible with nmbayes models.

The code reads a lot better with these (uncommitted) changes in my opinion as well.

I'll try to take a look once you push them, which will probably clear up my confusion above.

These changes should not impact bbr.bayes, but just mentioning it ahead of time.

I think so, as long as bbr.bayes can implement its own check_nonmem_finished method (and it sounds like that's staying as is).

(No need to spend time clearing up my confusion above, as long as you think bbr.bayes will be unaffected.)

barrettk commented 2 months ago

@kyleam I dont want to commit till I add a few more things/make sure the current tests still pass, but FWIW here are the current new functions below (just wanted to add clarity that not a whole lot changed). As you can see (though may be annoying to manually diff from the original), the maincheck_nonmem_finished.bbi_nonmem_model method is unchanged. The list method is how part of check_nonmem_finished instead of wait_for_nonmem (along with a new bootstrap method):

refactored functions ```r #' Check if `NONMEM` run is complete #' #' Checks if `NONMEM` run is done by looking for `"Stop Time"` in `.lst` file #' #' @param .mod a `bbi_nonmem_model` object, or list of `bbi_nonmem_model` objects. #' @param ... Arguments passed to methods. #' #' @importFrom readr read_lines #' @importFrom stringr str_detect #' #' @seealso wait_for_nonmem #' @return Returns `TRUE` if the model appears to be finished running and #' `FALSE` otherwise. #' @export check_nonmem_finished <- function(.mod, ...) { UseMethod("check_nonmem_finished") } #' @describeIn check_nonmem_finished takes a `bbi_nonmem_model` object. #' @export check_nonmem_finished.bbi_nonmem_model <- function(.mod, ...) { if (!fs::dir_exists(get_output_dir(.mod, .check_exists = FALSE))) { return(TRUE) # if missing then this failed right away, likely for some bbi reason } mod_path <- build_path_from_model(.mod, ".lst") # look for model to be finished and then test output model_finished <- if(file.exists(mod_path)){ read_lines(mod_path) %>% str_detect("Stop Time") %>% any() }else{ FALSE } return(isTRUE(model_finished)) } #' @describeIn check_nonmem_finished takes a `bbi_nmboot_model` object. #' @export check_nonmem_finished.bbi_nmboot_model <- function(.mod, ...) { boot_spec <- get_boot_model_files(.mod) boot_models <- boot_spec$bbi_model models_finished <- map_lgl(boot_models, ~check_nonmem_finished(.x)) return(models_finished) } #' @describeIn check_nonmem_finished takes a `list` of `bbi_nonmem_model` objects. #' @export check_nonmem_finished.list <- function(.mod, ...) { assert_list(.mod) check_model_object_list(.mod, .mod_types = NM_MOD_CLASS) models_finished <- map_lgl(.mod, ~check_nonmem_finished(.x)) return(models_finished) } #' Wait for `NONMEM` models to finish #' #' Calling `wait_for_nonmem()` will freeze the user's console until the model(s) #' have finished running. #' #' @param .mod a `bbi_nonmem_model`, `bbi_nmboot_model`, or list of #' `bbi_nonmem_model` objects. Other packages (e.g., `bbr.bayes`) may add #' additional methods. #' @param .time_limit integer for maximum number of seconds in total to wait #' before continuing (will exit after this time even if the run does not appear #' to have finished). #' @param .interval integer for number of seconds to wait between each check. #' #' #' @seealso check_nonmem_finished #' #' @importFrom purrr map_lgl #' @importFrom checkmate assert_list #' #' @export wait_for_nonmem <- function(.mod, .time_limit = 300, .interval = 5) { UseMethod("wait_for_nonmem") } #' @rdname wait_for_nonmem #' @export wait_for_nonmem.default <- function(.mod, .time_limit = 300, .interval = 5) { if(inherits(.mod, "list") && !inherits(.mod, "bbi_model")){ check_model_object_list(.mod, .mod_types = NM_MOD_CLASS) }else{ check_model_object(.mod, .mod_types = c(NM_MOD_CLASS, NMBOOT_MOD_CLASS)) } verbose_msg(glue("Waiting for {length(.mod)} model(s) to finish...")) Sys.sleep(1) # wait for lst file to be created expiration <- Sys.time() + .time_limit n_interval <- 0 while ((expiration - Sys.time()) > 0) { res <- check_nonmem_finished(.mod) if (all(res)) { break }else{ n_interval = n_interval + 1 # print message every 10 intervals if(n_interval %% 10 == 0){ verbose_msg(glue("Waiting for {length(res[!res])} model(s) to finish...")) } } Sys.sleep(.interval) } if(expiration < Sys.time() && !all(check_nonmem_finished(.mod))){ res <- check_nonmem_finished(.mod) warning(glue("Expiration was reached, but {length(res[!res])} model(s) haven't finished"), call. = FALSE, immediate. = TRUE) }else{ verbose_msg(glue("\n{length(.mod)} model(s) have finished")) } } ```

As an aside, I actually found and addressed a minor bug here. In wait_for_nonmem, we should have had all() as part of this line: if(expiration < Sys.time() && !all(check_nonmem_finished(.mod))) (previously purrr mapped over .mod, which was assumed to be a list of model objects (ref)). It likely wasn't caught because the expiration didn't time out in testing.

barrettk commented 2 months ago

For anyone looking through this, here are some example calls I have in my scratch pad that should work as of the latest commit:

Starting model

.mod <- MOD1

New bootstrap run, or read in a previous one

.boot_run <- new_bootstrap_run(.mod)
.boot_run <- read_model(file.path(MODEL_DIR, "1_boot_run_1"))

Sample data and create bootstrap runs ahead of time

.boot_run <- setup_bootrap_run(.boot_run, n = 10, .overwrite = TRUE)

Helpers

get_boot_spec_path(.boot_run) # Get the file path of the spec file
boot_spec <- get_boot_spec(.boot_run) # table of bootstrap run model & data paths
boot_models <- get_boot_models(.boot_run) # list of all bootstrap run models

Submission

.p <- submit_model(.boot_run, .overwrite = TRUE)

# Check status
check_nonmem_finished(.boot_run)

> get_model_status(.boot_run, max_print = 3)
10 model(s) have finished                                                                                                                                                                                
0 model(s) are still running
Summary ### Summarize ```r summarize_bootstrap_run(.boot_run) # Default summarize_bootstrap_run(.boot_run, estimates_only = TRUE) # quicker - no `model_summaries` call summarize_bootstrap_run(.boot_run, include_based_on = TRUE) # include based on model as part of the results ``` #### Example summary call: ```r > summarize_bootstrap_run(.boot_run, include_based_on = TRUE) # A tibble: 11 × 33 run absolute_model_path THETA1 THETA2 THETA3 THETA4 THETA5 `OMEGA(1,1)` `OMEGA(2,1)` `OMEGA(2,2)` `SIGMA(1,1)` bbi_summary needed_fail_flags problem_text 1 based_on /data/Projects/package_d… 2.32 54.6 463. -0.0820 4.18 0.0985 0 0.157 1 FALSE PK model 1 … 2 01 /data/Projects/package_d… 2.33 52.4 496. -0.0856 4.19 0.0721 0 0.160 1 FALSE Bootstrap r… 3 02 /data/Projects/package_d… 2.13 58.4 464. 0.0593 1.13 0.0824 0 0.116 1 FALSE Bootstrap r… 4 03 /data/Projects/package_d… 2.97 3.84 13.0 -0.0110 6.98 0.0656 0 0.972 1 FALSE Bootstrap r… 5 04 /data/Projects/package_d… 2.36 54.4 434. -0.0737 4.23 0.101 0 0.156 1 FALSE Bootstrap r… 6 05 /data/Projects/package_d… 2.31 56.2 436. -0.127 5.23 0.0805 0 0.129 1 FALSE Bootstrap r… 7 06 /data/Projects/package_d… 5.16 4.11 13.1 -0.00920 6.09 0.0547 0 10.3 1 FALSE Bootstrap r… 8 07 /data/Projects/package_d… 2.32 55.8 419. -0.0815 4.32 0.114 0 0.201 1 FALSE Bootstrap r… 9 08 /data/Projects/package_d… 2.27 54.9 422. -0.0830 4.29 0.104 0 0.153 1 FALSE Bootstrap r… 10 09 /data/Projects/package_d… 2.20 56.2 15.5 -0.0234 5.20 0.0675 0 11.3 1 FALSE Bootstrap r… 11 10 /data/Projects/package_d… 3.62 3.88 13.0 -0.0104 6.70 5.43 0 13.3 1 FALSE Bootstrap r… # ℹ 19 more variables: estimation_method , number_of_subjects , number_of_obs , ofv , param_count , condition_number , # any_heuristics , covariance_step_aborted , large_condition_number , eigenvalue_issues , correlations_not_ok , # parameter_near_boundary , hessian_reset , has_final_zero_gradient , minimization_terminated , eta_pval_significant , prderr , # error_msg , termination_code ```

These options were available before, and were referenced in the boot-collect.R script in the example project:

Previous summary methods (using new functions) ```r # Summary log summary_log(.boot_run[[ABS_MOD_PATH]]) # Confidence intervals/compare to based_on model boot_sum <- summarize_bootstrap_run(.boot_run) # this is a new function, but otherwise would have been `param_estimates_batch()` param_estimates_compare(boot_sum) param_estimates_compare(boot_sum, .orig_mod = read_model(get_based_on(.boot_run))) ``` ```r > param_estimates_compare(boot_sum, .orig_mod = read_model(get_based_on(.boot_run))) # A tibble: 9 × 5 parameter_names original_estimate `50%` `2.5%` `97.5%` 1 THETA1 2.32 2.33 2.15 4.81 2 THETA2 54.6 54.6 3.85 57.9 3 THETA3 463. 420. 13.0 489. 4 THETA4 -0.0820 -0.0485 -0.118 0.0439 5 THETA5 4.18 4.76 1.82 6.92 6 OMEGA(1,1) 0.0985 0.0815 0.0571 4.24 7 OMEGA(2,1) 0 0 0 0 8 OMEGA(2,2) 0.157 0.180 0.119 12.8 9 SIGMA(1,1) 1 1 1 1 ```
kylebaron commented 2 months ago

When tables are requested, they have the name of the original model. We might think about updating run numbers, but only if tables are requested. .msf files are also affected, but hate to spend time re-naming them; if we could do it quickly, I'd drop. Otherwise leave as is for now.

106-boot/0823$ ll
total 2144
drwxr-x---    2 kyleb kyleb   4096 Apr 29 16:57 ./
drwxr-xr-x 1003 kyleb kyleb  69632 Apr 29 16:15 ../
-rw-r--r--    1 kyleb kyleb    703 Apr 29 16:57 .gitignore
-rw-r--r--    1 kyleb kyleb   1920 Apr 29 16:54 0823.clt
-rw-r--r--    1 kyleb kyleb   3493 Apr 29 16:54 0823.coi
-rw-r--r--    1 kyleb kyleb   3493 Apr 29 16:54 0823.cor
-rw-r--r--    1 kyleb kyleb   3493 Apr 29 16:54 0823.cov
-rw-r--r--    1 kyleb kyleb     13 Apr 29 16:54 0823.cpu
-rw-r--r--    1 kyleb kyleb   1291 Apr 29 16:15 0823.ctl
-rwxr-x---    1 kyleb kyleb  32307 Apr 29 16:56 0823.ctl.out*
-rw-r--r--    1 kyleb kyleb  12196 Apr 29 16:54 0823.ext
-rw-r--r--    1 kyleb kyleb   9133 Apr 29 16:53 0823.grd
-rw-r--r--    1 kyleb kyleb  54510 Apr 29 16:56 0823.lst
-rw-r--r--    1 kyleb kyleb  26859 Apr 29 16:53 0823.phi
-rwxr-x---    1 kyleb kyleb    143 Apr 29 16:19 0823.sh*
-rw-r--r--    1 kyleb kyleb    937 Apr 29 16:53 0823.shk
-rw-r--r--    1 kyleb kyleb  13511 Apr 29 16:53 0823.shm
-rw-r--r--    1 kyleb kyleb  44624 Apr 29 16:54 0823.xml
-rw-r--r--    1 kyleb kyleb   6840 Apr 29 16:54 106.msf
-rw-r--r--    1 kyleb kyleb 851908 Apr 29 16:54 106.tab
-rw-r--r--    1 kyleb kyleb  51876 Apr 29 16:54 106_ETAS.msf
-rw-r--r--    1 kyleb kyleb  27520 Apr 29 16:54 106_RMAT.msf
-rw-r--r--    1 kyleb kyleb 430248 Apr 29 16:54 106_SMAT.msf
-rw-r--r--    1 kyleb kyleb 481132 Apr 29 16:54 106par.tab
-rw-r--r--    1 kyleb kyleb    784 Apr 29 16:57 Run_0823.o1823
-rwxr-xr-x    1 kyleb kyleb   1084 Apr 29 16:15 bbi.yaml*
-rwxr-x---    1 kyleb kyleb   1695 Apr 29 16:57 bbi_config.json*
-rwxr-x---    1 kyleb kyleb    159 Apr 29 16:15 grid.sh*
barrettk commented 2 months ago

All tests passing locally (including new bootstrap ones) as of the latest commit (b70f320):

> devtools::load_all()
ℹ Loading bbr
> devtools::test(filter = "bootstrap")
ℹ Testing bbr
✔ | F W  S  OK | Context
✔ |         71 | testing bootstrap functionality and running bbi [109.6s]                                                                                                         

══ Results ═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Duration: 109.6 s

[ FAIL 0 | WARN 0 | SKIP 0 | PASS 71 ]
> devtools::test()
ℹ Testing bbr
✔ | F W  S  OK | Context
✔ |         21 | bbr exec functions                                                                                                                                               
✔ |         71 | testing bootstrap functionality and running bbi [115.6s]                                                                                                         
✔ |         13 | checking if models are up to date [1.3s]                                                                                                                         
✔ |          9 | Collapse columns to string representation                                                                                                                        
✔ |         73 | Constructing config log from bbi_config.json [4.7s]                                                                                                              
✔ |         43 | Copying model objects [1.4s]                                                                                                                                     
✔ |          8 | copy-model-helpers                                                                                                                                               
✔ |         34 | cov-cor                                                                                                                                                          
✔ |         35 | Extract model paths from based_on fields [3.0s]                                                                                                                  
✔ |         34 | Test get_omega, get_sigma, and get_theta functions [2.9s]                                                                                                        
✔ |         67 | Build paths from model object [1.8s]                                                                                                                             
✔ |         22 | inherit-param-estimates [5.3s]                                                                                                                                   
✔ |         26 | initial-estimates [1.2s]                                                                                                                                         
✔ |          7 | model_diff() comparing models                                                                                                                                    
✔ |         19 | Test bbi summary on multiple models [1.3s]                                                                                                                       
✔ |        108 | Test bbi summary functions [4.8s]                                                                                                                                
✔ |         73 | Modify attributes of model object                                                                                                                                
✔ |         25 | Testing function to create or read in model object                                                                                                               
✔ |         35 | nm-file                                                                                                                                                          
✔ |         38 | nm-join [3.5s]                                                                                                                                                   
✔ |         26 | nm-tables                                                                                                                                                        
✔ |         30 | Test bbi batch parameter estimate functions [9.7s]                                                                                                               
✔ |         11 | Test param_estimates functions [1.3s]                                                                                                                            
✔ |         43 | test parsing labels for parameter table [2.1s]                                                                                                                   
✔ |         34 | testing print methods for bbi objects [11.1s]                                                                                                                    
✔ |          5 | read_bbi_path() helper function                                                                                                                                  
✔ |         50 | Reading NONMEM output files into R                                                                                                                               
✔ |         66 | Constructing run log from model yaml [3.0s]                                                                                                                      
✔ |         10 | submit_model(.dry_run=T)                                                                                                                                         
✔ |         14 | submit_models(.dry_run=T)                                                                                                                                        
✔ |         86 | Test creating summary logs [21.5s]                                                                                                                               
✔ |          9 | Comparing tags between models                                                                                                                                    
✔ |         47 | test_threads(.dry_run=T) [12.6s]                                                                                                                                 
✔ |         21 | tweak-initial-estimates [1.3s]                                                                                                                                   
✔ |         16 | test-use-bbi [4.8s]                                                                                                                                              
✔ |         34 | Utility functions for building args, etc.                                                                                                                        
✔ |         43 | testing a composable workflow and running bbi [248.8s]                                                                                                           

══ Results ═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Duration: 470.9 s

[ FAIL 0 | WARN 0 | SKIP 0 | PASS 1306 ]
barrettk commented 1 month ago

Tests still passing after latest refactors FWIW. Bootstrap tests now happen at the end since they take so long

Local Test Results ```r > devtools::test() ℹ Testing bbr ✔ | F W S OK | Context ✔ | 21 | bbr exec functions ✔ | 13 | checking if models are up to date [1.3s] ✔ | 9 | Collapse columns to string representation ✔ | 73 | Constructing config log from bbi_config.json [4.9s] ✔ | 43 | Copying model objects [1.3s] ✔ | 8 | copy-model-helpers ✔ | 34 | cov-cor ✔ | 35 | Extract model paths from based_on fields [3.9s] ✔ | 34 | Test get_omega, get_sigma, and get_theta functions [3.5s] ✔ | 67 | Build paths from model object [1.9s] ✔ | 22 | inherit-param-estimates [5.8s] ✔ | 26 | initial-estimates [1.2s] ✔ | 7 | model_diff() comparing models ✔ | 19 | Test bbi summary on multiple models [1.8s] ✔ | 108 | Test bbi summary functions [5.3s] ✔ | 73 | Modify attributes of model object ✔ | 25 | Testing function to create or read in model object ✔ | 35 | nm-file ✔ | 38 | nm-join [3.9s] ✔ | 26 | nm-tables [1.1s] ✔ | 30 | Test bbi batch parameter estimate functions [10.2s] ✔ | 11 | Test param_estimates functions [1.4s] ✔ | 43 | test parsing labels for parameter table [2.1s] ✔ | 34 | testing print methods for bbi objects [10.0s] ✔ | 5 | read_bbi_path() helper function ✔ | 50 | Reading NONMEM output files into R ✔ | 66 | Constructing run log from model yaml [2.3s] ✔ | 10 | submit_model(.dry_run=T) ✔ | 14 | submit_models(.dry_run=T) ✔ | 86 | Test creating summary logs [16.2s] ✔ | 9 | Comparing tags between models ✔ | 47 | test_threads(.dry_run=T) [11.6s] ✔ | 21 | tweak-initial-estimates [1.3s] ✔ | 16 | test-use-bbi [4.6s] ✔ | 34 | Utility functions for building args, etc. ✔ | 43 | testing a composable workflow and running bbi [253.2s] ✔ | 71 | testing bootstrap functionality and running bbi [105.1s] ══ Results ═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ Duration: 461.3 s [ FAIL 0 | WARN 0 | SKIP 0 | PASS 1306 ] ```
barrettk commented 1 month ago

@kyleam still reading through your comments, but just wanted to mention that I had spoke to @seth127 previously about making any changes to modify-records.R in the vpc PR since I made changes on top of this there. From what I've read so far your feedback is still relevant, but wanted to let you know that I would be addressing the feedback in the VPC PR.

kylebaron commented 1 month ago

@barrettk It looks like the summarize_bootstrap_run() is failing when we stratify on multiple variables.

EDIT: no; this did not reproduce when I switched back to two stratification variables. It must have been some glitch in one of the models.

boot_run <- setup_bootstrap_run(boot_run, n = 1000, strat_cols = c("STUDY", "RF"))
>   boot_sum <- summarize_bootstrap_run(boot_run)  
Error in `dplyr::bind_rows()` at purrr/R/superseded-map-df.R:69:3:                                               
! Can't combine `..1$version` <character> and `..552$version` <double>.
Run `rlang::last_trace()` to see where the error occurred.
kylebaron commented 1 month ago

Requesting better error message when we try to stratify on a nonexistent column:

> boot_run <- setup_bootstrap_run(boot_run, n = 1000, strat_cols = 
+                                   c("STUDY", "RF2"))
Error in setup_bootstrap_run(boot_run, n = 1000, strat_cols = c("STUDY", :
Assertion on 'all(strat_cols %in% names(starting_data))' failed: Must be TRUE.
barrettk commented 1 month ago

@kylebaron as of the last commit:

> setup_bootstrap_run(.boot_run, n = 10, strat_cols = c("SEXf", "ETNf"), .overwrite = T)
Error in `setup_bootstrap_run()`:                                                                                                                                                               
! The following `strat_cols` are missing from the input data: SEXf, ETNf
Run `rlang::last_trace()` to see where the error occurred.