`model_summary` bug when called via `summary_log()`

barrettk commented 5 months ago

When looking at a summary log of a model directory, we see a table that looks like this:

> boot_sum_log <- summary_log(
      boot_dir, .bbi_args = list(
         no_grd_file = TRUE, no_ext_file = TRUE, no_shk_file = TRUE
        )
> boot_sum_log
# A tibble: 100 × 22
   absolute_model_path  run   bbi_summary needed_fail_flags problem_text estimation_method number_of_subjects number_of_obs   ofv param_count
   <chr>                <chr> <list>      <lgl>             <chr>        <chr>                          <int>         <int> <dbl>       <int>
 1 /data/Projects/pack… 001   <bb_nnmm_>  FALSE             Bootstrap r… First Order Cond…                 39           741 2608.          NA
 2 /data/Projects/pack… 002   <bb_nnmm_>  FALSE             Bootstrap r… First Order Cond…                 39           741 2645.          NA
 3 /data/Projects/pack… 003   <bb_nnmm_>  FALSE             Bootstrap r… First Order Cond…                 39           741 2691.          NA
 4 /data/Projects/pack… 004   <bb_nnmm_>  FALSE             Bootstrap r… First Order Cond…                 39           741 2735.          NA
 5 /data/Projects/pack… 005   <bb_nnmm_>  FALSE             Bootstrap r… First Order Cond…                 39           741 2037.          NA
 6 /data/Projects/pack… 006   <bb_nnmm_>  FALSE             Bootstrap r… First Order Cond…                 39           741 2702.          NA
 7 /data/Projects/pack… 007   <bb_nnmm_>  FALSE             Bootstrap r… First Order Cond…                 39           741 2583.          NA
 8 /data/Projects/pack… 008   <bb_nnmm_>  FALSE             Bootstrap r… First Order Cond…                 39           741 2614.          NA
 9 /data/Projects/pack… 009   <bb_nnmm_>  FALSE             Bootstrap r… First Order Cond…                 39           741 1948.          NA
10 /data/Projects/pack… 010   <bb_nnmm_>  FALSE             Bootstrap r… First Order Cond…                 39           741 4293.          NA
# ℹ 90 more rows
# ℹ 12 more variables: condition_number <dbl>, any_heuristics <lgl>, covariance_step_aborted <lgl>, large_condition_number <lgl>,
#   eigenvalue_issues <lgl>, correlations_not_ok <lgl>, parameter_near_boundary <lgl>, hessian_reset <lgl>, has_final_zero_gradient <lgl>,
#   minimization_terminated <lgl>, eta_pval_significant <lgl>, prderr <lgl>
# ℹ Use `print(n = ...)` to see more rows

However, it seems we cant actually print the bbi_summary object due to an issue in tabulating several outputs in the nested param_estimates() call:

> boot_sum_log$bbi_summary[[1]]
Dataset: ../data/boot_1_001.csv

Records: 779     Observations: 741   Subjects: 39

Objective Function Value (final est. method): 2608.32

Estimation Method(s):

– First Order Conditional Estimation with Interaction 

No Heuristic Problems Detected

Error in tibble::as_tibble(.) :
All columns in a tibble must be vectors.
✖ Column `fixed` is NULL.

The above error gets triggered during this section of the print call, which traces back to this section of the param_estimates() call. As you can see, several parameters are not tabulated correctly:

summary via `summary_log`

```r > .summary[[SUMMARY_PARAM_DATA]] [[1]] [[1]]$method [1] "METHOD NOT DETECTED" [[1]]$estimates [[1]]$estimates$theta [1] 2.3100 53.9000 454.0000 -0.0734 4.1100 [[1]]$estimates$omega [1] 0.095 0.000 0.192 [[1]]$estimates$sigma [1] 1 [[1]]$std_err [[1]]$std_err$theta [1] 0.0900 3.0600 32.7000 0.0542 1.3900 [[1]]$std_err$omega [1] 0.0174 NA 0.0240 [[1]]$std_err$sigma [1] NA [[1]]$random_effect_sd [[1]]$random_effect_sd$omega [1] NA NA NA [[1]]$random_effect_sd$sigma [1] NA [[1]]$random_effect_sdse [[1]]$random_effect_sdse$omega [1] NA NA NA [[1]]$random_effect_sdse$sigma [1] NA [[1]]$fixed named list() ```

For reference, here is what we see when looking at the model_summary output of MOD1:

summary via `model_summary`

```r > .summary[[SUMMARY_PARAM_DATA]] [[1]] [[1]]$method [1] "TABLE NO. 1: First Order Conditional Estimation with Interaction: Goal Function=MINIMUM VALUE OF OBJECTIVE FUNCTION: Problem=1 Subproblem=0 Superproblem1=0 Iteration1=0 Superproblem2=0 Iteration2=0" [[1]]$estimates [[1]]$estimates$theta [1] 2.3171600 54.6151000 462.5140000 -0.0820082 4.1795900 [[1]]$estimates$omega [1] 0.0985328 0.0000000 0.1568250 [[1]]$estimates$sigma [1] 1 [[1]]$std_err [[1]]$std_err$theta [1] 0.0872080 3.3755100 30.2623000 0.0561272 1.3779600 [[1]]$std_err$omega [1] 2.03449e-02 1.00000e+10 2.72401e-02 [[1]]$std_err$sigma [1] 1e+10 [[1]]$random_effect_sd [[1]]$random_effect_sd$omega [1] 0.313899 0.000000 0.396011 [[1]]$random_effect_sd$sigma [1] 1 [[1]]$random_effect_sdse [[1]]$random_effect_sdse$omega [1] 3.24068e-02 1.00000e+10 3.43931e-02 [[1]]$random_effect_sdse$sigma [1] 1e+10 [[1]]$fixed [[1]]$fixed$theta [1] 0 0 0 0 0 [[1]]$fixed$omega [1] 0 1 0 [[1]]$fixed$sigma [1] 1 ```

Here is the print of the model summary call on the same model that failed via the summary_log() call:

> model_summary(boot_models[[1]])
Dataset: ../data/boot_1_001.csv                                                                                                                                                                      

Records: 779     Observations: 741   Subjects: 39

Objective Function Value (final est. method): 2608.32

Estimation Method(s):

– First Order Conditional Estimation with Interaction 

No Heuristic Problems Detected

|parameter_names |estimate |stderr |shrinkage |
|:---------------|:--------|:------|:---------|
|THETA1          |2.31     |0.0900 |          |
|THETA2          |53.9     |3.06   |          |
|THETA3          |454      |32.7   |          |
|THETA4          |-0.0734  |0.0542 |          |
|THETA5          |4.11     |1.39   |          |
|OMEGA(1,1)      |0.0950   |0.0174 |19.0      |
|OMEGA(2,2)      |0.192    |0.0240 |3.06      |

Im focusing on other bbr functionality at the moment, and did not dig any deeper, though this should likely be addressed somewhat soon. Though this was discovered via the upcoming bootstrap functions, the issue is unrelated as far as I can tell.

seth127 commented 5 months ago

@barrettk thanks for reporting. Am I understanding this correctly that this is only failing when trying to print the bbi_nonmem_summary object that's in the summary log tibble?

Maybe I missed it, but have you tried extracting one of those summary objects and calling param_estimates() directly on it? Does that fail too, or is the issue in the model_summary() print method?

barrettk commented 5 months ago

It unfortunately seems to be more than just printing. The underlying data itself seems to have NA values instead of the true values. But to answer your question, when I call model_summary(mod) %>% param_estimates() it reads in and displays the information as expected (even for a bootstrap model run). The two dropdown menus in the original issue description illustrate the differences.

It does seem odd we wouldn't have caught this sooner though. To the point where I would like someone to confirm they're seeing the same issue when reproduced (any summary_log() call should be sufficient).

barrettk commented 3 months ago

@seth127 FYI I figured out the issue:

It happens when no_ext_file = TRUE. There are a number of entry points where we could fix this (param_estimates, summary_log_impl, etc), though we could improve the bootstrap by not setting that to TRUE within summarize_bootstrap_run:

boot_sum_log <- summary_log(
  boot_dir, .bbi_args = list(
    no_grd_file = TRUE, no_ext_file = TRUE, no_shk_file = TRUE    # proposing --> remove `no_ext_file = TRUE`
)

That change could make sense to include within the current PR. I know we wanted to increase the probability model_summary calls succeed, but im wondering if this specific flag should/could be removed. cc @kylebaron

Alternative: we could make .bbi_args an argument, that defaults to list(no_grd_file = TRUE, no_shk_file = TRUE)

seth127 commented 3 months ago

So, if I'm understanding this correctly, can this issue be boiled down to something like

When model_summary() is called with .bbi_args = list(no_ext_file = TRUE) then several of the elements of the resulting object are either NA or empty.
- In these cases, it's being called underneath by either summary_log() or model_summaries(), but this is the core issue.
When these elements are missing/empty, this causes param_estimates() to fail, because it expects them to be there
- specifically, from the error message, it expects there to be a non-NULL fixed element
- You show the print() method failing, but I'm pretty sure it's the param_estimates() call inside it that's actually failing.
- We should verify this ^ by checking model_summary(mod, .bbi_args = list(no_ext_file = TRUE)) %>% param_estimates()

If that's the correct, I think the first thing to look into is whether we actually need to those elements to come back for a successful param_estimates() call? If not, we should just handle that case more elegantly and not error. In other words: do we need and .ext file for param_estimates() I would guess no, but I'm not sure.

Some other things we probably want to look into at some point as well:

Why are we using no_ext_file = TRUE in the relevant summary_log() calls? Is there a reason we should need that .ext file, or do we only need it as a by-product of avoiding this (fairly edge-case) error? Is there a reason to think it won't be there for bootstraps?
Why don't those elements show up in the object returned from model_summary(mod, .bbi_args = list(no_ext_file = TRUE))? This is probably a bbi question, and we may not need to look into it at this point, but I'm interested to know if there's a reason we can only get those from the .ext file. To be clear, I think this is the lowest priority thing to look into.

metrumresearchgroup / bbr

`model_summary` bug when called via `summary_log()` #675