Open barrettk opened 3 months ago
@KatherineKayMRG I am going to update boot functions to allow for different .probs
to be used from bbr::param_estimates_compare
. Right now, we only allow 3 probs to be used. Should we add functionality to allow for more than 3 (i.e. probs = c(0.15, 0.25, 0.75, 095))? I assume that would not be used very often- do you think it's worth adding in >3 probs functionality?
hmmm, good question.
First I'd just like to clarify what probs
is? I don't see it in the define_boot_table
function. I'm assuming it's the percentiles of the data summarised from the bootstrap run?
I think the default assumption is ok
if (all(c("parameter_names", "p50", "p2.5", "p97.5") %in% names(.bootParam0))){
.bootParam1 <- .bootParam0 %>%
dplyr::rename(estimate = "p50", lower = "p2.5", upper = "p97.5")
}
but I agree the code in the else would be incorrect.
I think if you're going to allow uses to define their own percentiles then you're going to have a few things to change down stream right? For example, if they opt for the 5th, 50th, and 95th percentiles, then this column heading would be the boot_ci_90...
and if you let them pick any options here there are endless possibilities... do we want/need to accommodate that? Maybe we don't rename the columns if they pick custom percentiles beyond the 90 and 95% CI? Would that break code downstream?
Similarly, I guess someone might want to see the 90% and 95% CI (plus median), or the IQR plus a CI but I think this would be rare. IMO it might be worth saying, at some point, if it's a highly unusual table then maybe they should be on the hook to customise that code?
@kylebaron - any thoughts here?
First I'd just like to clarify what
probs
is? I don't see it in thedefine_boot_table
function. I'm assuming it's the percentiles of the data summarised from the bootstrap run?
Yes, probs = numeric vector with values between 0 and 1 to be passed through to stats::quantile(). Represents the quantiles to calculate for parameter estimates.
Users could theoretically put as many values in probs
as they want. I agree- we could say we accommodate 95 and 90 with median and IQR but otherwise the function will output the percentile unformatted (i.e. format_boot_table
would just output p3.2, p86, p94 etc..). That would be pretty easy to add that in.
The way I have it coded it now, everything will break if it's not 95
It sort of seems unlikely that allowing 6 probs through that it will result in something meaningful at the end; I'm saying this by following where p2.5
, p50
, and p97.5
come from and where they go after that (the format functions). I guess you might generate some percentiles and then form a custom interval with the result, but seems sort of unlikely (or something you could still force through down the road).
It would be nice if you could specify the interval that you want to get back out; like ci
argument that you could set to, for example, 95 and 50 and you'd get back a 95% CI and the inter quartile range (50%). Something convenient like that. I think we do do that sort of thing and it might be nice to automate that. This would mean you'd have to propagate this information into the define function and then on to the format function, but I think you could do it.
I'd also consider dropping the call to bbr::compare_estimates()
; if you look at that function, it's very simple what it does to summarize the samples and you could do it easily yourself in pmparams
. Importantly, you would know then names of the columns coming back (e.g., whether it is p2.5
or 2.5%
) and you'd reduce the level of dependency on outside packages.
Adding a comment for clarity:
The core of the issue I originally filed is that define_boot_table()
was only capable of making a 95% confidence interval
bbr
was not installed, though the call to bbr::param_estimates_compare()
was dropped entirely theremodel_dir <- system.file("model/nonmem", package = "pmparams")
paramKey <- file.path(model_dir, "pk-parameter-key-new.yaml")
mod <- bbr::read_model(file.path(model_dir, "106"))
# Raw bootstrap estimates
boot_est <- utils::read.csv(file.path(model_dir, "boot/data/boot-106.csv"))
# AUTOMATICALLY is a 95% CI with no way to overwrite
boot_df <- define_boot_table(boot_est, mod, .key = paramKey) %>%
format_boot_table()
> boot_df
abb desc boot_value boot_ci_95
1 KA (1/h) First order absorption rate constant 1.57 1.39, 1.78
2 V2/F (L) Apparent central volume 61.5 58.3, 65.1
3 CL/F (L/h) Apparent clearance 3.23 3.07, 3.42
4 V3/F (L) Apparent peripheral volume 67.3 65.0, 69.8
5 Q/F (L/h) Apparent intercompartmental clearance 3.61 3.37, 3.86
6 $\\text{CL/F}_{eGFR}$ eGFR effect on CL/F 0.484 0.408, 0.558
7 $\\text{CL/F}_{AGE}$ Age effect on CL/F -0.0386 -0.167, 0.0878
8 $\\text{CL/F}_{ALB}$ Serum albumin effect on CL/F 0.420 0.294, 0.587
9 IIV-KA Variance of absorption 0.218 <NA>
10 IIV-V2/F Variance of central volume 0.0821 <NA>
11 IIV-CL/F Variance of clearance 0.112 <NA>
12 V2/F-KA Covariance of V2/F - KA 0.0656 <NA>
13 CL/F-KA Covariance of CL/F - KA 0.121 <NA>
14 CL/F-V2/F Covariance of CL/F - V2/F 0.0696 <NA>
15 Proportional Variance 0.0400 <NA>
Note: The values are NA for the random effect parameter tables because the example boot-106.csv saved the names out improperly, causing them to not be joinable. The csv was fixed and joining capabilities were improved in #67
> names(boot_est)
[1] "run" "THETA1" "THETA2" "THETA3" "THETA4" "THETA5" "THETA6" "THETA7" "THETA8" "OMEGA.1.1." "OMEGA.2.1."
[12] "OMEGA.2.2." "OMEGA.3.1." "OMEGA.3.2." "OMEGA.3.3." "SIGMA.1.1."
Mostly for @seth127, but I dont think it makes sense to get out a fix for this for the next release, as the full reworking done in #67 is really needed. The methodology is subject to change, but allowing .ci
to be passed requires a notable refactor given that we dont want to rely on bbr::param_estimates_compare()
Thanks for the explanation @barrettk .
I dont think it makes sense to get out a fix for this for the next release, as the full reworking done in https://github.com/metrumresearchgroup/pmparams/pull/67 is really needed.
This makes sense to me. I think our next step here is to create an issue that details the proposed "custom percentiles" feature. This was never really "officially" proposed, but kind of built up out of the discussion on this issue. In my mind, we need to:
As mentioned, there's no rush to get this into the next release, but I do think it's worth pushing it along in the next couple months, so that this doesn't get stale and let the bug linger. Thanks all of you, for this discussion and moving this along.
cc'ing @graceannobrien for Seth's reply, mostly for:
I think our next step here is to create an issue that details the proposed "custom percentiles" feature. This was never really "officially" proposed..
Could you file this issue Grace? Mostly because Im not sure where the .percentiles
came from (e.g., offline discussions). I can elaborate in a follow-up comment with any code snippets from #66 / #67 for proposed functionality and/or provide additional details if helpful.
There is some problematic logic in
define_boot_table()
.For one, users can pass any
probs
toparam_estimates_compare
, so we cant assume those column names. Additionally, I wonder if we want to expose other arguments ofparam_estimates_compare()
withindefine_boot_table()
, such asprobs
,.compare_cols
, or just a...
argument.