New Feature: jitter/tweak initial parameter estimates

Background

Sample code for how we propose updating the estimates is shown below, though note that we will be parsing the control stream file rather than pulling parameter estimates from a model summary of a previously executed model.

thetas <- get_theta(model_summary(mod106))

# user input, how much to tweak by
.p <- 0.1

# actual percentage to tweak each one by
tweak_perc <- runif(length(thetas), -.p, .p)

# actual tweak is made (this is how PsN works)
thetas <- thetas + (thetas * tweak_perc)

Details on how PsN/Pirana uses this feature:

 -degree = fraction
            Default not set. After updating the initial estimates in the output file,
            randomly perturb them by degree=fraction, i.e. change estimate to
            a value randomly chosen in the range estimate +/- estimate*fraction
            while respecting upper and lower boundaries, if set in the model file.
            Degree is set to 0.1, a 10% perturbation, when option tweak_inits is
            set in execute. EOF

Feedback requested

The overall procedure for doing this is somewhat simple, though there remain a few outstanding questions that require scientist feedback:

How should we handle bounded THETA values?
- By default, we think we should respect the original bounds. In other words, if a user wants to tweak the initial value by 20%, but this falls outside the bounds, the tweak for this parameter will be constrained to the bounds. i.e. this parameter might only be able to be jittered by 10%, while the rest are capable of fluctuating +/- 20%. In the event this happens, it could be good to warn the user, though feedback here would be appreciated.
- We have also considered allowing the bounds to be altered to encapsulate the provided tweak percentage. Using the above example, if the sampled tweak percentage fell outside a bound, the bound would be adjusted to be, say, 10% larger than the tweaked initial value. The feedback we want here is as follows:
  - Should we allow for this at all via an argument?
  - If so, how should the bounds be adjusted? Is the above suggestion sufficient, or is there another mechanism that would make more sense?
How should we handle FIXED parameters?
- Should these values be skipped when tweaking the initial values always?
- Should there be an argument that allows the user to choose between including/excluding these parameters?
Do we want nuanced control over tweaking various record blocks? Some things to think about:
- Similar to inherit_param_estimates, do we want the option to only tweak THETA records for instance (i.e. ignore OMEGA and SIGMA records)?
- Any special handling for off-diagonal values when covariance is specified for matrix-type records?
- Any other options for special handling?
Documentation and Details surrounding the function
- Given that this feature is somewhat controversial within the pharmacometrics community, do we want to include any details surrounding this, or would it be better to simply say what the function does and leave it up to the user to be aware of any potential concerns?
  - Perhaps something like "Tweaking initial estimates in subsequent model runs may help to avoid falling into local minima during model convergence, though evidence supporting this is somewhat sparse (maybe include source). Some risks include..." or "Things to think about.."
  - I assume it would be good to at least mention the intention of the function.

Comments from meeting with @curtisKJ, @timwaterhouse, and @seth127:

THETA bounds: dont mess with bounds at all. If the new value falls outside of one of the bounds, we are leaning towards setting the initial estimate to the value of the bound.
- In the event this occurs, send a warning. Likely a single warning that mentions each case where this happens. Documentation should also highlight this
Matrix Records (e.g., OMEGA records) may need additional handling. It was mentioned we should ensure positive definite values (positive determinant). We should investigate if PsN does anything like this.
FIXED values should be left alone. Mention this in the documentation, but dont return messages or warnings when this happens
Need to look further into how PsN tweaks values. Does it only tweak THETA records?
Do we want tweak values using a single percent, or a different sampled percent for each occurrence? We are leaning towards the latter.
At the end Curtis brought up the consideration of log-scaled THETA records. In these cases, the percent may actually have a larger impact than intended/assumed. Determining if the value is in the log scale would be difficult, and would require both A) parsing the PK records block to look for exp() occurrences, and B) mapping it back to THETA records (cc @kyleam). This would likely be a lot of work. Curtis suggested this may be overcomplicating things, but I wanted to mention this discussion just in case it was worth exploring.
I mentioned that this kind of function could be useful for bootstrap calls, and suggested potentially having a wrapper around this function for more easily running bootstraps. Curtis thought this could be pretty helpful. That is a conversation for another day, but just wanted to mention it now.

Thanks for the write-up @barrettk, and thanks very much for your input @timwaterhouse and @curtisKJ. This all sounds good to me, and it answers most of the questions we had above. A few comments from me:

I think the consensus is, ideally, to allow user to control which blocks to update, similar to the inherit_param_estimates() interface.

That said, if jittering the matrix records get complicated (see next comment), we may roll out an initial version of this that only jitters the THETA estimates.

Matrix Records (e.g., OMEGA records) may need additional handling. It was mentioned we should ensure positive definite values (positive determinant). We should investigate if PsN does anything like this.

That's interesting. We will definitely want to look into this and keep getting scientist input along the way.

In terms of how PsN does it, some research is definitely warranted, but step 1 is confirming whether it even touches these matrices.

Do we want tweak values using a single percent, or a different sampled percent for each occurrence? We are leaning towards the latter.

I'm not 100% sure I follow this comment, but I think the runif(length(thetas), -.p, .p) part of the example code at the top (which is the current intent) is describing the latter approach.

At the end Curtis brought up the consideration of log-scaled THETA records. In these cases, the percent may actually have a larger impact than intended/assumed.

That's an interesting point, but I think we're probably not going to do anything with that (in terms of checking whether certain values are log-scale), at least on our first release. Maybe something to consider for future improvements, especially if folks notice this becoming an issue.

this kind of function could be useful for bootstrap calls

I'm interested in the bootstrap discussion as well, though I don't immediately follow the connection. But, as Kyle mentions, that would be a discussion for another day/issue.

metrumresearchgroup / bbr

New Feature: jitter/tweak initial parameter estimates #632

Background

Feedback requested