Closed wlandau closed 2 months ago
A couple notes:
factor(order = FALSE)
seems more idiomatic to formulas than as.character()
.factor(ordered = FALSE)
is not a hack:While formulae usually involve just variable and factor names, they can also involve arithmetic expressions. The formula log(y) ~ a + log(x) is quite legal.
There is no direct mention of arbitrary non-arithmetic functions, but support seems to extend to this, and posts on StackOverflow seem to rely on it.
The downside of factor(order = FALSE)
is it creates long variable names. We could shorten them by defining a special function within the scope of brm_model()
. But that would make it harder to for users to check brms::make_standata()
...
We could use levels() for the "brm_levels_time" attribute in the data to preserve ordering, and we could use as.character() in the fixed effects part of the formula to make sure we get an additive term for each level:
On second thought, maybe we don't need to be so prescriptive in the formula. Users may want a slope on numeric time, for instance. I think all that is needed is to relax the existing restrictions on types and avoid coercion to character vectors.
On second thought, maybe we don't need to be so prescriptive in the formula. Users may want a slope on numeric time, for instance. I think all that is needed is to relax the existing restrictions on types and avoid coercion to character vectors.
But that brings us back to the first issue of ordered factors for the time variable...
From https://library.virginia.edu/data/articles/understanding-ordered-factors-in-a-linear-model, it turns out the issue with ordered factors comes from contr.poly()
. AVISIT.L, AVISIT.Q, and AVISIT.C are linear, quadratic, and cubic terms in a polynomial. For brms.mmrm
, we can simply set treatment contrasts on the time variable in brm_data()
.
Fixed in #116
By the way, I added brm_data_chronologize()
to help users create the recommended ordered factor for time. Featured in most of the documentation now.
The order of levels in the discrete time variable does not usually matter, but it does come into play for autoregressive correlations and successive differences archetypes. In these situations, if the discrete time variable is
c("VISIT1", "VISIT2", ..., "VISIT9", "VISIT10", "VISIT11")
, then VISIT10 will come before VISIT2, which is wrong.I think the cleanest way to deal with this is to support ordered factors for the time variable. We could use
levels()
for the"brm_levels_time"
attribute in the data to preserve ordering, and we could useas.character()
in the fixed effects part of the formula to make sure we get an additive term for each level:Without
as.character()
in the main part of the formula though, we get terms likeAVISIT.L
,AVISIT.Q
, andAVISIT.C
instead. But inunstr()
, we can apparently settime = AVISIT
withoutas.character()
, and we get individualsigma
parameters for each time point.It seems okay to condition on this behavior, provided we test that we get the expected parameters from
brms
when we fit a model like this. That way ifbrms
changes how it handles ordered factors, we will know immediately.