define_boot_table column assumptions

barrettk commented 3 months ago

There is some problematic logic in define_boot_table().

For one, users can pass any probs to param_estimates_compare, so we cant assume those column names. Additionally, I wonder if we want to expose other arguments of param_estimates_compare() within define_boot_table(), such as probs, .compare_cols, or just a ... argument.

graceannobrien commented 3 months ago

@KatherineKayMRG I am going to update boot functions to allow for different .probs to be used from bbr::param_estimates_compare. Right now, we only allow 3 probs to be used. Should we add functionality to allow for more than 3 (i.e. probs = c(0.15, 0.25, 0.75, 095))? I assume that would not be used very often- do you think it's worth adding in >3 probs functionality?

KatherineKayMRG commented 3 months ago

hmmm, good question.

First I'd just like to clarify what probs is? I don't see it in the define_boot_table function. I'm assuming it's the percentiles of the data summarised from the bootstrap run?

I think the default assumption is ok

if (all(c("parameter_names", "p50", "p2.5", "p97.5") %in% names(.bootParam0))){
  .bootParam1 <- .bootParam0 %>%
    dplyr::rename(estimate = "p50", lower = "p2.5", upper = "p97.5")
}

but I agree the code in the else would be incorrect.

I think if you're going to allow uses to define their own percentiles then you're going to have a few things to change down stream right? For example, if they opt for the 5th, 50th, and 95th percentiles, then this column heading would be the boot_ci_90...

Screenshot 2024-07-29 at 2 53 34 PM

and if you let them pick any options here there are endless possibilities... do we want/need to accommodate that? Maybe we don't rename the columns if they pick custom percentiles beyond the 90 and 95% CI? Would that break code downstream?

Similarly, I guess someone might want to see the 90% and 95% CI (plus median), or the IQR plus a CI but I think this would be rare. IMO it might be worth saying, at some point, if it's a highly unusual table then maybe they should be on the hook to customise that code?

@kylebaron - any thoughts here?

graceannobrien commented 3 months ago

First I'd just like to clarify what probs is? I don't see it in the define_boot_table function. I'm assuming it's the percentiles of the data summarised from the bootstrap run?

Yes, probs = numeric vector with values between 0 and 1 to be passed through to stats::quantile(). Represents the quantiles to calculate for parameter estimates.

Users could theoretically put as many values in probs as they want. I agree- we could say we accommodate 95 and 90 with median and IQR but otherwise the function will output the percentile unformatted (i.e. format_boot_table would just output p3.2, p86, p94 etc..). That would be pretty easy to add that in.

The way I have it coded it now, everything will break if it's not 95

kylebaron commented 3 months ago

It sort of seems unlikely that allowing 6 probs through that it will result in something meaningful at the end; I'm saying this by following where p2.5, p50, and p97.5 come from and where they go after that (the format functions). I guess you might generate some percentiles and then form a custom interval with the result, but seems sort of unlikely (or something you could still force through down the road).

It would be nice if you could specify the interval that you want to get back out; like ci argument that you could set to, for example, 95 and 50 and you'd get back a 95% CI and the inter quartile range (50%). Something convenient like that. I think we do do that sort of thing and it might be nice to automate that. This would mean you'd have to propagate this information into the define function and then on to the format function, but I think you could do it.

I'd also consider dropping the call to bbr::compare_estimates(); if you look at that function, it's very simple what it does to summarize the samples and you could do it easily yourself in pmparams. Importantly, you would know then names of the columns coming back (e.g., whether it is p2.5 or 2.5%) and you'd reduce the level of dependency on outside packages.

barrettk commented 3 weeks ago

Adding a comment for clarity:

The core of the issue I originally filed is that define_boot_table() was only capable of making a 95% confidence interval

We also had (pre #67) no error handling if bbr was not installed, though the call to bbr::param_estimates_compare() was dropped entirely there

model_dir <- system.file("model/nonmem", package = "pmparams")
paramKey <-  file.path(model_dir, "pk-parameter-key-new.yaml")
mod <- bbr::read_model(file.path(model_dir, "106"))

# Raw bootstrap estimates
boot_est <- utils::read.csv(file.path(model_dir, "boot/data/boot-106.csv"))

# AUTOMATICALLY is a 95% CI with no way to overwrite
boot_df <- define_boot_table(boot_est, mod, .key = paramKey) %>%
  format_boot_table()

> boot_df
                     abb                                  desc boot_value     boot_ci_95
1               KA (1/h)  First order absorption rate constant       1.57     1.39, 1.78
2               V2/F (L)               Apparent central volume       61.5     58.3, 65.1
3             CL/F (L/h)                    Apparent clearance       3.23     3.07, 3.42
4               V3/F (L)            Apparent peripheral volume       67.3     65.0, 69.8
5              Q/F (L/h) Apparent intercompartmental clearance       3.61     3.37, 3.86
6  $\\text{CL/F}_{eGFR}$                   eGFR effect on CL/F      0.484   0.408, 0.558
7   $\\text{CL/F}_{AGE}$                    Age effect on CL/F    -0.0386 -0.167, 0.0878
8   $\\text{CL/F}_{ALB}$          Serum albumin effect on CL/F      0.420   0.294, 0.587
9                 IIV-KA                Variance of absorption      0.218           <NA>
10              IIV-V2/F            Variance of central volume     0.0821           <NA>
11              IIV-CL/F                 Variance of clearance      0.112           <NA>
12               V2/F-KA               Covariance of V2/F - KA     0.0656           <NA>
13               CL/F-KA               Covariance of CL/F - KA      0.121           <NA>
14             CL/F-V2/F             Covariance of CL/F - V2/F     0.0696           <NA>
15          Proportional                              Variance     0.0400           <NA>

Note: The values are NA for the random effect parameter tables because the example boot-106.csv saved the names out improperly, causing them to not be joinable. The csv was fixed and joining capabilities were improved in #67

> names(boot_est)
 [1] "run"        "THETA1"     "THETA2"     "THETA3"     "THETA4"     "THETA5"     "THETA6"     "THETA7"     "THETA8"     "OMEGA.1.1." "OMEGA.2.1."
[12] "OMEGA.2.2." "OMEGA.3.1." "OMEGA.3.2." "OMEGA.3.3." "SIGMA.1.1."

Mostly for @seth127, but I dont think it makes sense to get out a fix for this for the next release, as the full reworking done in #67 is really needed. The methodology is subject to change, but allowing .ci to be passed requires a notable refactor given that we dont want to rely on bbr::param_estimates_compare()

seth127 commented 3 weeks ago

Thanks for the explanation @barrettk .

I dont think it makes sense to get out a fix for this for the next release, as the full reworking done in https://github.com/metrumresearchgroup/pmparams/pull/67 is really needed.

This makes sense to me. I think our next step here is to create an issue that details the proposed "custom percentiles" feature. This was never really "officially" proposed, but kind of built up out of the discussion on this issue. In my mind, we need to:

Define what that feature would do: what it would allow and what it would return
Decide if we actually want to implement that (how useful is this really? it's not clear to me from this discussion)
Decide whether this fixes the issue that @barrettk originally found a. If we implement this new feature, does it fix the issue as a byproduct? b. If we don't want to implement this new feature, then what is the more isolated fix that would maintain current functionality, but squash the bug?

As mentioned, there's no rush to get this into the next release, but I do think it's worth pushing it along in the next couple months, so that this doesn't get stale and let the bug linger. Thanks all of you, for this discussion and moving this along.

barrettk commented 3 weeks ago

cc'ing @graceannobrien for Seth's reply, mostly for:

I think our next step here is to create an issue that details the proposed "custom percentiles" feature. This was never really "officially" proposed..

Could you file this issue Grace? Mostly because Im not sure where the .percentiles came from (e.g., offline discussions). I can elaborate in a follow-up comment with any code snippets from #66 / #67 for proposed functionality and/or provide additional details if helpful.

metrumresearchgroup / pmparams

define_boot_table column assumptions #64