Proposal: "reframed" data and formula

openpharma / brms.mmrm

R package to run Bayesian MMRMs using {brms}

https://openpharma.github.io/brms.mmrm/

Other

18 stars 2 forks source link

Proposal: "reframed" data and formula #93

Closed wlandau closed 5 months ago

wlandau commented 5 months ago

For #40, I think we are heading toward the following kind of workflow:

Define the data and formula as usual.
Reframe (1) to accommodate a different parameterization.
Get a prior and run the model on (2).
Transform model parameters back to (1).
Proceed with posterior inference.

I will prototype this in a branch. It will be tricky because brms.mmrm thus far has been designed to use an ordinary data/formula and makes a lot of assumptions there.

wlandau commented 5 months ago

I see two implementation challenges:

Transform the data and formula (as in https://github.com/openpharma/brms.mmrm/discussions/92) and create a matching user-defined prior for brms.
Use (1) in the rest of brms.mmrm.

wlandau commented 5 months ago

To start, I am thinking about a new function brm_parameterize_successive() for successive time differences (from https://github.com/openpharma/brms.mmrm/discussions/92). In this first attempt, it will accept:

a brm_data() object
a brm_formula() object
information for a prior

And it will return a list of:

a new classed data object
a new classed formula object
a brms prior on successive differences
a matrix to transform successive difference posterior samples to posterior samples for the original parameterization.

Later, we will need to decide how to accept this output in brms_model() and brm_marginal_draws() in a clear, consistent, back-compatible way.

wlandau commented 5 months ago

Maybe we decouple the prior from all of this and think about a consistent set of separate prior specification functions.

wlandau commented 5 months ago

When I sketched https://github.com/openpharma/brms.mmrm/issues/93, I realized just how chaotic this fully manual approach would be in the interface. It is actually super disruptive if the user and the package needs to maintain two different sets of data and two different formulas.

wlandau commented 5 months ago

On the other hand, #95 could be limiting because cell-means-like sdif contrasts would not be possible, and neither would many other types of models we might want to define. For example, because fixed effects are independent a priori, some users might even want to regress on principal components.

We might limit the chaos if we treat each end-to-end analysis as its own workflow which has minimal overlap with the main functions. For successive differences, we could have brm_data_sdf(), brm_formula_sdif(), etc.

wlandau commented 5 months ago

brm_data_sdif() could take a brm_data() object as input.
brm_formula() and other methods could use S3 dispatch to handle specific cases.
Priors and marginal transformations would need to be case-specific too, and could be managed with S3 dispatch.
The sdif treatment effect parameterization gets confusing in the presence of a subgroup.

wlandau commented 5 months ago

Moving the last couple comments to their own issue.