Add a first spatial covariance structure: spatial exponential

danielinteractive commented 2 years ago

To do:

[x] Read a bit about spatial covariance structures, e.g. at https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.3/statug/statug_mixed_syntax14.htm
[x] Adapt algorithm vignette (write down theoretical part - e.g. in this case we don't start from a common covariance matrix for all patients but construct a new one for each patient based on the coordinates)
[x] Adapt covariance structure vignette (write down theoretical part of the exponential cov structure)
[x] Think about how to best receive the required numeric time point values from the user
- [x] Note that so far we are just taking the levels of the time point (factor) variable in their order
- [x] Note that also multiple numeric columns might represent together the coordinates of a single visit
[x] Think about how to best pass on the numeric time coordinates to the C++ code
[x] Add definition of spatial in general and spatial exponential covariance structure in particular to covariance vignette
[x] Adjust covariance structure choices in https://github.com/openpharma/mmrm/blob/709f392e966b12514525ea6f6118d8e9af32089c/R/tmb.R#L60
[x] Initialize start parameters in https://github.com/openpharma/mmrm/blob/709f392e966b12514525ea6f6118d8e9af32089c/R/tmb.R#L208
- [x] Amend tests in https://github.com/openpharma/mmrm/blob/709f392e966b12514525ea6f6118d8e9af32089c/tests/testthat/test-tmb.R#L80
[x] Amend printing of covariance structure in https://github.com/openpharma/mmrm/blob/b1b612263623c9d8f9e972adf19d09e23558664c/R/mmrm-methods.R#L109
[x] Add lower Cholesky factor function and include it in branching in https://github.com/openpharma/mmrm/blob/01ae7f936cd43ad8788b84a7a03467b217a9ccaa/src/covariance.h#L41
- [x] Test it in https://github.com/openpharma/mmrm/blob/main/src/test-covariance.cpp
  - [x] Install and Restart package
  - [x] Build > More > Test package
[x] Amend integration tests in https://github.com/openpharma/mmrm/blob/709f392e966b12514525ea6f6118d8e9af32089c/tests/testthat/test-tmb.R#L256
- [x] Note that r2stream is not available outside of Roche, so just run SAS differently and save log as text file in the design folder as reference
[x] Run Clean and Rebuild > Check to ensure that checks pass

kkmann commented 2 years ago

Interesting, is it correct that a spacial covariance structure essentially leads to a Gaussian Process model (with modeled mean function?) or am I off here?

danielinteractive commented 2 years ago

@kkmann I guess that is true - based on my intuition:-)

clarkliming commented 2 years ago

proposal: instead of using cs(time|subject) or cs(time|group/subject), use cs(time) + id(subject) to indicate time has correlations by subject, or cs(time|group) + id(subject) to indicate time has correlations per group by subject, or cs(time1, time2) to indicate numerical spatial coordinates.

with this, if we have multiple endpoints with correlations, we can also use this framework. e.g, heights and weights are somehow correlated with, for example, unstructured correlation, and for each time point, they have AR1 correlation; taking heights and weights into this model, we have something like ar1(time) + un(variable) + id(subject); the resulting covariance matrix is the Kronecker product of ar1 covariance matrix and unstructured covariance matrix. (we sort them internally)

danielinteractive commented 2 years ago

Interesting thanks @clarkliming ! However I would prefer for now to keep it simpler and not go into Kronecker products etc. I think we could still use syntax exp(time1, time2 | group / subject) which looks ok ? Also if only using one time variable we can assert whether it is numeric. Basically the covariance structure decides whether we need factor or numeric for the time variable.

Would that work?

kkmann commented 2 years ago

There is something to it. Should probably think about a grammar of covariance structures at some point :)

From a user perspective, it is probably simpler to stick with a singe call as exp(t | id). I would avoid the term "spatio" completely here. That's where it comes from (Kriging), but it is very confusing in our context since the majority of intended users would use it over time, not space. The term exp seems a bit generic and could confuse. This particular kernel is often referred to as "squared exponential" (which is not ideal either since it is more an exponentiated negative squared distance kernel...). So exp_sq(t | id) could work. Or, if possible, gp(t | id, type = "squared exponential") since it is the covariance structure of a Gaussian process with squared exponential kernel. Most other covariance structures (if not all?) could be seen as inducing GPs as well, but sometimes with weird distance metrics, or over discrete spaces, which is a rather uncommon view. So I would restrict the gp notation to continuous time covariance functions and use the established MMRM naming conventions otherwise.

The Kronecker could be the main selling point for v2.0.0 ;)

danielinteractive commented 2 years ago

@kkmann for the naming - how about "euclidean covariance structures"? since these use the euclidean space and distances for calculations. The usual cov structures would then be "ordered" in contrast, since they only use the ordered common time points.

kkmann commented 2 years ago

Ah, the "type" parameter might be a bit confusing. I was basically suggesting to use gp() (as in Gaussian process) for all covariance structures defined by smooth covariance functions of continues variables. It avoids any notion of spatial or temporal. I guess we only consider stationary covariance functions of Euclidean distance, most prominently the squared exponential one. Maybe better call it gp(t | id, covariance_function = "squared exponential")? This would make it clear that this is fundamentally different from the other "ordered" covariance structures already implmented. Technically these would be GPs on the ordered discrete set of planned visits, but I doubt that anyone but me might find that analogy helpful x)

Must say, I really love the idea to allow Kronecker products eventually @clarkliming!

While we are at it, is it standard to "add" the covariance function in the formula notation? I found that a bit weird, in mixed effects models it makes some sense to add fixed and random effect but here it is really only a notational convention, if I am not mistaken. Could also specify the covariance structure in a separate argument.

openpharma / mmrm

Add a first spatial covariance structure: spatial exponential #82