openpharma / brms.mmrm

R package to run Bayesian MMRMs using {brms}
https://openpharma.github.io/brms.mmrm/
Other
18 stars 2 forks source link

Controlling the order of time points #113

Closed wlandau closed 2 months ago

wlandau commented 3 months ago

The order of levels in the discrete time variable does not usually matter, but it does come into play for autoregressive correlations and successive differences archetypes. In these situations, if the discrete time variable is c("VISIT1", "VISIT2", ..., "VISIT9", "VISIT10", "VISIT11"), then VISIT10 will come before VISIT2, which is wrong.

I think the cleanest way to deal with this is to support ordered factors for the time variable. We could use levels() for the "brm_levels_time" attribute in the data to preserve ordering, and we could use as.character() in the fixed effects part of the formula to make sure we get an additive term for each level:

FEV1 ~ ARMCD + ARMCD:as.character(AVISIT) + as.character(AVISIT) + WEIGHT + SEX + unstr(time = AVISIT, gr = USUBJID) 
sigma ~ 0 + AVISIT

Without as.character() in the main part of the formula though, we get terms like AVISIT.L, AVISIT.Q, and AVISIT.C instead. But in unstr(), we can apparently set time = AVISIT without as.character(), and we get individual sigma parameters for each time point.

It seems okay to condition on this behavior, provided we test that we get the expected parameters from brms when we fit a model like this. That way if brms changes how it handles ordered factors, we will know immediately.

wlandau commented 2 months ago

A couple notes:

While formulae usually involve just variable and factor names, they can also involve arithmetic expressions. The formula log(y) ~ a + log(x) is quite legal.

There is no direct mention of arbitrary non-arithmetic functions, but support seems to extend to this, and posts on StackOverflow seem to rely on it.

wlandau commented 2 months ago

The downside of factor(order = FALSE) is it creates long variable names. We could shorten them by defining a special function within the scope of brm_model(). But that would make it harder to for users to check brms::make_standata()...

wlandau commented 2 months ago

We could use levels() for the "brm_levels_time" attribute in the data to preserve ordering, and we could use as.character() in the fixed effects part of the formula to make sure we get an additive term for each level:

On second thought, maybe we don't need to be so prescriptive in the formula. Users may want a slope on numeric time, for instance. I think all that is needed is to relax the existing restrictions on types and avoid coercion to character vectors.

wlandau commented 2 months ago

On second thought, maybe we don't need to be so prescriptive in the formula. Users may want a slope on numeric time, for instance. I think all that is needed is to relax the existing restrictions on types and avoid coercion to character vectors.

But that brings us back to the first issue of ordered factors for the time variable...

wlandau commented 2 months ago

From https://library.virginia.edu/data/articles/understanding-ordered-factors-in-a-linear-model, it turns out the issue with ordered factors comes from contr.poly(). AVISIT.L, AVISIT.Q, and AVISIT.C are linear, quadratic, and cubic terms in a polynomial. For brms.mmrm, we can simply set treatment contrasts on the time variable in brm_data().

wlandau commented 2 months ago

Fixed in #116

wlandau commented 2 months ago

By the way, I added brm_data_chronologize() to help users create the recommended ordered factor for time. Featured in most of the documentation now.