test_threads() NBURN patch

metrumresearchgroup / bbr

R interface for model and project management

https://metrumresearchgroup.github.io/bbr/

Other

23 stars 2 forks source link

test_threads() NBURN patch #542

Closed barrettk closed 2 years ago

barrettk commented 2 years ago

A few things that have changed from the corresponding issue:

if NBURN was set to 0, we dont overwrite it, as that could change how the model executes
Added support for MAX as well
There seem to be several other iterative estimation parameters we could also overwrite (MAPITER, MAPNITER, CITER, etc.). I didnt add support for those, but moving forward it would be easier to add support for them
I didnt automatically add NBURN = .max_eval if NITER was defined and NBURN wasn't set
- We may still want to do this, but I wanted to do a bit more research before doing it for sure. See discoveries below on why it might make sense to add support for this.
The logic flow is a bit different, but dont think it was necessary to match it exactly, as I was trying to avoid changing the existing logic as much as possible

Discovery Highlights

There are cases where NBURN is not set when NITER is (see complex models in this package), though I believe the default of 2000/4000 (depending on estimation method) would be used in that case (supporting docs).
Screenshots:
- Interestingly, I couldnt find MAX or MAXEVAL definitions, though MAXEVALS (note the plural) was defined. This makes me wonder if MAX and MAXEVAL are actually shorthand notations for MAXEVALS

Additional Definitions

Tables 3 to 5 in this full text may also be helpful

closes https://github.com/metrumresearchgroup/bbr/issues/538

barrettk commented 2 years ago

@seth127 note the 4th bullet and corresponding discovery. Can add support for that but part of me felt you might want to look through the code a bit first to see if you liked the logic.

I pulled out the overwriting aspect of test_threads() and made it its own function, but IMO we may want to break it down even further to make the conditional logic more legible. At the same time I dont think we want too many helper functions that are only used in one place, but let me know your thoughts

kyleam commented 2 years ago

If I understand correctly, the main conceptual problem you're trying to get a handle on is what estimation options .max_eval should map to and how those options interact. I wonder if the design of test_theads is making that problem for itself. The scientist knows the input model and knows what they would want to replace, so why not leave it up to them to say?

test_threads(mod, .threads = c(2, 4),
             .replace = c(NITER = 500, NBURN = 100))

seth127 commented 2 years ago

Introduction

There's a lot going on here. Thanks to @barrettk, @kyleam, and @sonokokawa for all of your diligence on this.

I'll start by saying that the MAXEVALS docs (screenshot above) contain the sentence Default: a generous number, which is one of the greatest things I've ever read in any documentation anywhere. A truly generous number. One that you'd want by your side when times get tough.

Ok, on to real stuff...

Conceptual refocus

I think we should back up and think about the purpose of this function, and this .max_eval argument:

We want scientists to easily be able to get a ballpark figure for the optimal number of threads to parallelize their model across.
We want the test models to run quickly, so that the user gets results quickly, but also so that they don't run up costs by firing off 4+ copies of a model that runs for hours/days.

There are a lot of edge cases here, but focusing on those two points ^, let's default to something that will (hopefully) not require much user intervention and not run for a long time.

My thoughts on the @kyleam proposed solution

I see the following problems with asking the user to specify which things they want to replace:

They will need to open their control stream and presumably specify the exact syntax they want to replace (i.e. MAX= vs. MAXEVAL=, etc.)
There will be possibly be multiple estimation methods, so we either need them to mix-and-match (like .replace = list(MAXEVAL=10, NITER=50) or have some more complicated nested syntax (which will likely be obnoxious).
If they mispecify something (e.g. forget an estimation method, or put MAXEVAL instead of MAX or whatever) then the models will run for their full length of time (which we would really like to avoid, for reasons listed above).
We will want some defaults, so that the user doesn't have to do ^ every time, but I have no idea what sensible defaults would look like for this. Maybe something like .replace = list(MAXEVAL=10, NITER=50, NBURN=50) which would replace any matching elements on each estimation method. But then we're back to the MAX vs MAXEVAL vs MAXEVALS trap, and implementing this in a sane way seems unlikely.

My proposed solution

I think something like this (pseudo-code below) is the simplest approach that's has the highest probability of success:

for (each line that starts with $EST) {
  if (MAXEVAL found) {
    replace with MAXEVAL=.maxeval
  } else if (NITER found) {
    replace with NITER=.maxeval
    if (NBURN found) {
      replace with NBURN=.maxeval
    } else {
      add NBURN=.maxeval
    }
  } else {
    warn and print $EST line (interactively ask for yes? See below)
  } 
}

The warning/prompt would be something like

"The following estimation line does not have MAXEVALS or NITER specified, so test_threads() cannot cap it: {est_line}"
"This means the model will run for its full length of time. Would you like to proceed with this test? Y/N"

Notes:

I'm on the fence about whether to really stop and ask for user input if we hit that last else block, but I think it's probably worth it, given that a warning here would probably be ignored (mangled in with other submission messages) and this is the most likely scenario where the user would be about to accidentally launch a bunch of long-running models.
For NBURN, I think leaning towards replacing with .max_eval (or adding NBURN=.max_eval) is the best move, because this is another scenario where the default could be "launch of a bunch of long-running models." If you really think NBURN=0 is a special case worth handling then we can, but I'm not sure it really matters for our purpose here (getting a reasonable time/threads estimate).
I think having a single .max_eval argument (though we can change that name to something like .cap_iterations or whatever) is more likely to get the desired result than asking the user to look at their control stream and specify the correct syntax.

Thoughts on all of that?

Sidenote on other `ITER` options

I'm pretty sure none of these other options (CITER, etc.) are directly related to runtime, so I think we should ignore them all for now.

kyleam commented 2 years ago

more likely to get the desired result than asking the user to look at their control stream and specify the correct syntax

Yeah, perhaps I'm just not the audience. I still think, as a user, I'd be baffled by the magical .max_eval (or whatever you end up calling it) and would find it much simpler to just know I need to match (exactly) what I put in the control stream. For the multiple method thing, again I think you just match exactly and all the methods.

I've read over your reply twice, and I just can't convince myself I think it's the better way to go. That's not meant to be a blocker though, just a dissenting opinion.

barrettk commented 2 years ago

@seth127 @kyleam I just realized something. We cant just grepl for "EST", as some options may actually be coded below the $EST line, such as this model:

$EST METHOD=SAEM NBURN=3000 NITER=2000 PRINT=10 ISAMPLE=2
     ISAMPLE_M1=1 ISAMPLE_M2=1 ISAMPLE_M3=1
     CTYPE=3 CITER=10 CALPHA=0.05
$EST METHOD=IMP INTERACTION EONLY=1 NITER=5 ISAMPLE=3000 PRINT=1 SIGL=8 SEED=123334
     CTYPE=3 CITER=10 CALPHA=0.05

If NBURN had coincidentally been coded in the second line, it wouldnt have been picked up. My original searching for the exact characters wouldnt have failed here (because it looks for NBURN specifically). I do agree with first filtering to the $EST block though, so my fix for this is actually to use the existing function clean_ctl(), which will parse the ctl into separate code chunks:

> mod_lines
$COV
$COV[[1]]
[1] "MATRIX=R UNCONDITIONAL"

$DATA
$DATA[[1]]
[1] "../example2.csv IGNORE=C"

$ERROR
$ERROR[[1]]
[1] "CALLFL=0"         "SDSL=THETA(11)"   "W=F*SDSL"         "Y = F + W*EPS(1)" "IPRED=F"          "IWRES=(DV-F)/W"  

$EST
$EST[[1]]
[1] "METHOD=SAEM NBURN=3000 NITER=2000 PRINT=10 ISAMPLE=2      ISAMPLE_M1=1 ISAMPLE_M2=1 ISAMPLE_M3=1"
[2] "CTYPE=3 CITER=10 CALPHA=0.05"                                                                    

$EST[[2]]
[1] "METHOD=IMP INTERACTION EONLY=1 NITER=5 ISAMPLE=3000 PRINT=1 SIGL=8 SEED=123334      CTYPE=3 CITER=10 CALPHA=0.05"

...truncated

Only issue with this solution is that id have to get it back into a form that can be used with writeLines(), which might be somewhat of a pain

kyleam commented 2 years ago

@barretk, oy, good catch. I'm glad you spotted that before the release.

I do agree with first filtering to the $EST block though, so my fix for this is actually to use the existing function clean_ctl()

Hmm, that seems to really deconstruct the ctl file. And as you suggest, I think the logic for putting it back together would be non-trivial (and I think ideally the generated ctl file would stay as close as possible to the original).

If the goal is to get the positions of all EST lines, perhaps something like below would do (where lines is a vector lines from the ctl file).

section_starts <- which(stringr::str_detect(lines, "^\\S+"))
ends <- c(section_starts[2:length(section_starts)] - 1,
          length(lines))

sections <- purrr::map2(section_starts, ends, `:`)
est_sections <- stringr::str_detect(lines[section_starts], "^\\$EST")
est_idxs <- unlist(sections[est_sections])

That's admittedly not very readable (though some polishing might help) and at least needs some safety checks around bounds, but I think it might be better for the purposes of the test_threads function.

barrettk commented 2 years ago

@kyleam awesome. That code helps a lot. It seems to work for the $EST block with the example im working with, but doesnt for some of the other code blocks (e.g. $PK and $ERROR blocks in complex/example2_saemimp.ctl). I think thats because of the ^//S+ regex. Not entirely sure if its possible to accurately identify all the code blocks. All the important blocks start with $, but we'd obviously lose the top level comments by searching for that. Will see if I can improve upon that.

section_starts <- which(stringr::str_detect(mod_lines, "\\$")) might be enough of a change

kyleam commented 2 years ago

It seems to work for the $EST block with the example im working with, but doesnt for some of the other code blocks (e.g. $PK and $ERROR blocks in complex/example2_saemimp.ctl).

Huh, I might be confused about what the goal is. I thought we just wanted the positions of the $EST lines (including the subsequent indented lines).

As far as I can tell, the code does account for $PK and $ERROR lines when determining the bounds for $EST blocks.

> lines <- readLines("inst/model/nonmem/complex/example2_saemimp.ctl")
> section_starts <- which(stringr::str_detect(lines, "^\\S+"))
> lines[section_starts] %>% str_subset("PK")
[1] "$PK"
> lines[section_starts] %>% str_subset("ERROR")
[1] "$ERROR"

I think thats because of the ^//S+ regex.

That bit is identifying any lines that don't start with whitespace:

> stringr::str_subset(c("one", "  foo", "blah"), "^\\S+")
[1] "one"  "blah"

That includes the $PK and $ERROR parts you mention (see above).

section_starts <- which(stringr::str_detect(mod_lines, "\\$")) might be enough of a change

That drops the beginning-of-line anchor, so it makes it much more likely to get a false positive match with a "$" that somewhere else in a line.

barrettk commented 2 years ago

@kyleam

Huh, I might be confused about what the goal is. I thought we just wanted the positions of the $EST lines (including the subsequent indented lines).

Yes thats true. Just to be safe though I wanted to make sure that code also correctly parsed the other code blocks. It doesn't really affect the code since all we care about is $EST, but I felt more comfortable using it if it correctly grabbed the other blocks as well (even though the comments might not be put in the correct block, or would get removed if declared above $PROBLEM).

That drops the beginning-of-line anchor, so it makes it much more likely to get a false positive match with a "$" that somewhere else in a line.

I dont think you can have a $ anywhere else in the model besides declaring parameter blocks, right? (actual question, I assumed that to be the case, but my only evidence is what i've seen). If in fact it is possible, id still like to correctly parse the other blocks. Perhaps "^\\S\\$+"? Just commited a bunch of changes but will give that a shot.

EDIT: yeah that didnt work (misunderstood regex stuff, I think "^\\$" would be ok?

barrettk commented 2 years ago

FYI to all, I removed/altered a test. test_threads() will no longer error out if neither MAXEVAL or NITER are found, as its totally possible one estimation method wont have those two. Having a test for that previously definitely showcased our naivety when it came to estimation methods. I instead replaced that test to also check for occurrences of both MAXEVAL and MAX.

Mainly mentioning it because I have a feeling that would make some alarms go off when reviewing this PR, but also because we might want to pull these changes into the requirements PR before merging that one (since I modified a requirement)

barrettk commented 2 years ago

Messaged offline about this, but documenting here. Went with:

      line_starts <- which(stringr::str_detect(mod_lines, "^\\S"))
      section_starts <- which(stringr::str_detect(mod_lines, "^\\$"))
      section_starts <- intersect(line_starts, section_starts)

Code is a bit wacky, but I feel more comfortable that it groups the other code blocks as well:

>       line_starts <- which(stringr::str_detect(mod_lines, "^\\S")); line_starts
  [1]   1   2   3   5   7   8   9  11  12  13  14  15  16  17  18  19  20  21  22  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47
 [44]  48  49  50  51  52  53  54  55  56  58  59  60  61  62  63  64  65  67  68  69  70  71  72  73  74  75  76  77  78  79  83  84  85  86  87  88  89  90  91  92  93  94  96
 [87]  97  98  99 100 101 102 103 105 106 107 108 110 111 112 115 117
>       section_starts <- which(stringr::str_detect(mod_lines, "^\\$")); section_starts
 [1]   5   7   8   9  22  24  58  68  84  97  99 107 112 115 117
>       section_starts <- intersect(line_starts, section_starts); section_starts
 [1]   5   7   8   9  22  24  58  68  84  97  99 107 112 115 117

kyleam commented 2 years ago

Code is a bit wacky, but I feel more comfortable that it groups the other code blocks as well:

My offline replies:

would just focus on EST lines. you can already inspect line_starts when debugging and be confident that it’s grabbing all the blocks, and $EST blocks are the only thing of interest. widening the focus doesn’t really gain you anything, and it’s more confusing to a reader who wonders why you’re doing so

the main question for this code is whether there’s ever a case that you would want to include a subsequent line in the EST block that does not start with whitespace

that’s the main assumption it makes

i feel like the main point is getting lost here: if we just want the EST lines, we can do a much simpler/constrained parsing based on the following assumptions 1) the lines we care about start with “$EST” and 2) we can stop collecting lines for a given EST once we encounter a line that does not start with white space. anything above/beyond that is just adding complexity and obscuring the approach

fwiw, this is a good place to make a dedicated internal helper (e.g., get_est_idx or whatever) and then add tests to throw a lot of different character vectors at it to test different scenarios (including boundary conditions, like empty character vector)

barrettk commented 2 years ago

Not sure what happened in the commits there^. I used "amend previous commit" because I forgot to run devtools::document(). Think I misunderstand what that does, but hopefully the branch is ok

kyleam commented 2 years ago

Not sure what happened in the commits there^. I used "amend previous commit" because I forgot to run devtools::document().

Amending in that situation is a good approach, although ideally you do it before you push anything. Then git push extends the PR with just the amended commit.

However, you must have committed, pushed, amended, merged origin/patch/test_threads_nburn into your local patch/test_threads_nburn, and then pushed again:

$ git log --graph --oneline -n4
*   0cd6aa4b (HEAD, origin/patch/test_threads_nburn, refs/pull/origin/542) Merge branch 'patch/test_threads_nburn' of github.com:metrumresearchgroup/bbr into patch/test_threads_nburn
|\
| * 43035eeb pulled code out into helper function, to make more tests. Changed requirements, and added new test for no EST line found
* | 6a03e6e4 pulled code out into helper function, to make more tests. Changed requirements, and added new test for no EST line found
|/
* 06ac6ea7 removed `::` declaration

So, 43035eeb was your first commit:

$ git show -s --format=fuller HEAD^2
commit 43035eeb573c6075946ce199c64be06d77a7d2e1
Author:     barrettk <barrettk@metrumrg.com>
AuthorDate: Thu Aug 18 13:16:13 2022 -0400
Commit:     barrettk <barrettk@metrumrg.com>
CommitDate: Thu Aug 18 13:16:13 2022 -0400

    pulled code out into helper function, to make more tests. Changed requirements, and added new test for no EST line found

And 6a03e6e4 was your second:

$ git show -s --format=fuller HEAD^
commit 6a03e6e454680a25eaa6b4ffef3872120ac86bef
Author:     barrettk <barrettk@metrumrg.com>
AuthorDate: Thu Aug 18 13:16:13 2022 -0400
Commit:     barrettk <barrettk@metrumrg.com>
CommitDate: Thu Aug 18 13:16:45 2022 -0400

    pulled code out into helper function, to make more tests. Changed requirements, and added new test for no EST line found

Notice the different commit dates. As expected, the only difference an Rd:

$ git diff --name-only 43035eeb 6a03e6e4
man/get_est_idx.Rd

Think I misunderstand what that does, but hopefully the branch is ok

The history is messy due to the near-duplicate commits, but the end result is fine content-wise:

$ git diff 6a03e6e4 0cd6aa4b
# empty

barrettk commented 2 years ago

However, you must have committed, pushed, amended, merged origin/patch/test_threads_nburn into your local patch/test_threads_nburn, and then pushed again:

Yeah that must be because I pushed the previous one first. I realized I had forgot to document the function right after clicking it. When I went to push the amended commit, it errored. Then I pulled (which said no updates as expected), and tried again. That time it said it was merging, and then pushed the changes. I never explicitly attempted to merge anything, which is why I was confused. But yeah thanks for checking the state

barrettk commented 2 years ago

I think this PR is ready for a real review now (code wise). I recently altered the regex to allow whitespaces before $ blocks, because some users do style their models this way. There is some concern about this, but im not sure we have another option.

I think the code and tests are in a good state, but I would want to merge in this PR (https://github.com/metrumresearchgroup/bbr/pull/535#issuecomment-1218556726), and pull those changes into here before merging, as I touched the same requirements in a couple places, and the new requirement I made needs to be added to the correct story first

kyleam commented 2 years ago

I recently altered the regex to allow whitespaces before $ blocks, because some users do style their models this way.

Thanks for running that by scientists. I wasn't aware that a leading space was valid, so the original snippet I posted definitely makes an incorrect assumption.

barrettk commented 2 years ago

Ran new mrgvalidate package to ensure stories and requirements are properly linked after the merge/addressing the conflicts:

test_dir <- "/data/Projects/package_dev/bbr/inst/validation/test_results"
doc_args <- readRDS(file.path(test_dir, "doc_args.RDS"))
release_notes <- file.path(doc_args$auto_test_dir, "NEWS-filtered.md") 

res_df <- create_package_docs(
  product_name = doc_args$PKGNAME,
  version = doc_args$PKGVERSION,
  repo_url = "git@github.com:metrumresearchgroup/bbr.git",
  specs = doc_args$spec,
  release_notes_file = release_notes,
  auto_test_dir = doc_args$auto_test_dir,
  output_dir = doc_args$output_dir,
  style_dir = doc_args$style_dir,
  write = TRUE,
  cleanup_rmd = TRUE
)

The missing tests are expected, and match what I saw previously

> find_missing(res_df)
15 missing piece(s) found. Check results
$find_tests_without_reqs
# A tibble: 15 × 2
   TestId      TestName                                                        
   <chr>       <chr>                                                           
 1 BBR-PLB-001 build_matrix_indices() parses correctly                         
 2 BBR-PLB-004 parse_param_comment() called internally: ref_mix2               
 3 BBR-ROT-007 check_nonmem_table_output() output matches ref df               
 4 BBR-ROT-008 check_nonmem_table_output(.x_floor=0) works                     
 5 BBR-UTL-001 check_bbi_args parses correctly                                 
 6 BBR-UTL-002 format_cmd_args parses correctly                                
 7 BBR-UTL-003 parse_args_list() merges lists as expected                      
 8 BBR-UTL-004 parse_args_list() handles NULL as expected                      
 9 BBR-UTL-005 parse_args_list() correctly fails if .func_args isn't named     
10 BBR-UTL-006 combine_list_objects() merges lists as expected                 
11 BBR-UTL-007 combine_list_objects() merges with append=TRUE                  
12 BBR-UTL-008 combine_list_objects() correctly fails if .func_args isn't named
13 BBR-UTL-009 check_required_keys() works correctly                           
14 BBR-UTL-010 strict_mode_error() works correctly                             
15 BBR-UTL-011 suppressSpecificWarning() works                                 

$find_reqs_with_missing_tests
# A tibble: 0 × 3
# … with 3 variables: RequirementId <chr>, RequirementDescription <chr>, TestId <chr>
# ℹ Use `colnames()` to see all variable names

$find_reqs_without_stories
# A tibble: 0 × 2
# … with 2 variables: RequirementId <chr>, RequirementDescription <chr>
# ℹ Use `colnames()` to see all variable names

Additionally, I checked the tests/requirements that overlapped between these two PR's via the generated docs (test results, traceability matrix, and requirements-specification), and they look good as well:

metrumresearchgroup / bbr

test_threads() NBURN patch #542

A few things that have changed from the corresponding issue:

Discovery Highlights

Additional Definitions

Introduction

Conceptual refocus

My thoughts on the @kyleam proposed solution

My proposed solution

Sidenote on other `ITER` options

Test Results

Traceability Matrix

Requirements Specification

metrumresearchgroup / bbr

test_threads() NBURN patch #542

A few things that have changed from the corresponding issue:

Discovery Highlights

Additional Definitions

Introduction

Conceptual refocus

My thoughts on the @kyleam proposed solution

My proposed solution

Sidenote on other ITER options

Test Results

Traceability Matrix

Requirements Specification

Sidenote on other `ITER` options