openpharma / mmrm

Mixed Models for Repeated Measures (MMRM) in R.
https://openpharma.github.io/mmrm/
Other
132 stars 23 forks source link

Add a first spatial covariance structure: spatial exponential #82

Closed danielinteractive closed 2 years ago

danielinteractive commented 2 years ago

To do:

kkmann commented 2 years ago

Interesting, is it correct that a spacial covariance structure essentially leads to a Gaussian Process model (with modeled mean function?) or am I off here?

danielinteractive commented 2 years ago

@kkmann I guess that is true - based on my intuition:-)

clarkliming commented 2 years ago

proposal: instead of using cs(time|subject) or cs(time|group/subject), use cs(time) + id(subject) to indicate time has correlations by subject, or cs(time|group) + id(subject) to indicate time has correlations per group by subject, or cs(time1, time2) to indicate numerical spatial coordinates.

with this, if we have multiple endpoints with correlations, we can also use this framework. e.g, heights and weights are somehow correlated with, for example, unstructured correlation, and for each time point, they have AR1 correlation; taking heights and weights into this model, we have something like ar1(time) + un(variable) + id(subject); the resulting covariance matrix is the Kronecker product of ar1 covariance matrix and unstructured covariance matrix. (we sort them internally)

danielinteractive commented 2 years ago

Interesting thanks @clarkliming ! However I would prefer for now to keep it simpler and not go into Kronecker products etc. I think we could still use syntax exp(time1, time2 | group / subject) which looks ok ? Also if only using one time variable we can assert whether it is numeric. Basically the covariance structure decides whether we need factor or numeric for the time variable.

Would that work?

kkmann commented 2 years ago

There is something to it. Should probably think about a grammar of covariance structures at some point :)

From a user perspective, it is probably simpler to stick with a singe call as exp(t | id). I would avoid the term "spatio" completely here. That's where it comes from (Kriging), but it is very confusing in our context since the majority of intended users would use it over time, not space. The term exp seems a bit generic and could confuse. This particular kernel is often referred to as "squared exponential" (which is not ideal either since it is more an exponentiated negative squared distance kernel...). So exp_sq(t | id) could work. Or, if possible, gp(t | id, type = "squared exponential") since it is the covariance structure of a Gaussian process with squared exponential kernel. Most other covariance structures (if not all?) could be seen as inducing GPs as well, but sometimes with weird distance metrics, or over discrete spaces, which is a rather uncommon view. So I would restrict the gp notation to continuous time covariance functions and use the established MMRM naming conventions otherwise.

The Kronecker could be the main selling point for v2.0.0 ;)

danielinteractive commented 2 years ago

@kkmann for the naming - how about "euclidean covariance structures"? since these use the euclidean space and distances for calculations. The usual cov structures would then be "ordered" in contrast, since they only use the ordered common time points.

kkmann commented 2 years ago

Ah, the "type" parameter might be a bit confusing. I was basically suggesting to use gp() (as in Gaussian process) for all covariance structures defined by smooth covariance functions of continues variables. It avoids any notion of spatial or temporal. I guess we only consider stationary covariance functions of Euclidean distance, most prominently the squared exponential one. Maybe better call it gp(t | id, covariance_function = "squared exponential")? This would make it clear that this is fundamentally different from the other "ordered" covariance structures already implmented. Technically these would be GPs on the ordered discrete set of planned visits, but I doubt that anyone but me might find that analogy helpful x)

Must say, I really love the idea to allow Kronecker products eventually @clarkliming!

While we are at it, is it standard to "add" the covariance function in the formula notation? I found that a bit weird, in mixed effects models it makes some sense to add fixed and random effect but here it is really only a notational convention, if I am not mistaken. Could also specify the covariance structure in a separate argument.