Feature Request: SEs on estimated covariance/standard deviation components.

mattfidler commented 5 years ago

See Issue #243 for discussion.

mattfidler commented 5 years ago

Hi @billdenney

This is to collect my thoughts as well as to explain the current state of things in nlmixr. Since you requested it I am tagging you.

Why Standard Error on the Covariates

I could see where standard errors or variances on covariances/correlations would be useful because:

It shows correlations are occurring in the model
It can be used for a type of simulation with variability on omegas.

What is going on currently in nlmixr

Currently as it stands:

nlme does not calculate standard errors on covariance terms
saem calculates some standard errors on covariance terms while optimizing, but these are not always accurate.
focei and family skips calculation of standard errors on covariance terms.

Possibility for standard errors in FOCEi

For focei calculation of standard errors could be turned on, however:

nlmixr's focei does not estimate omega covariance, but a more stable transformation with two steps:
- first chol(solve(omega)) and
- second making sure the diagonals are positive by estimating diag(chol(solve(omega))) = some transformation (by default x^2 or sqrt).
- This ensures the omega matrix is always symmetric positive definite
- The standard errors of this could be calculated; This would allow a modeler to look at the correlations (as in your example), but the standard errors on say omega would not make it to a report and care would have to be taken when simulating from these parameters to back-transform to the correct scale

Simulation approaches with and without standard errors

This details sampling from the uncertainty in the parameter estimates themselves.

Approach 1 - simulating with only the standard errors

In this approach a multivariate normal is used to simulate the variance and covariance components that are estimated. Even with the transformations discussed in the FOCEi section, the co-variance could be calculated from the simulations and simulated from for each "study".

Advantages

Uses the parameter uncertainty as measured by the estimated standard errors.
Uses the THETA/Omega/Sigma "interactions" when simulating the parameters

Disadvantages

The prior is not completely specified, and ignores the other off-diagonal pieces of the covariance matrix. This is theoretically incorrect.
The matrix created can be non-positive definite depending on how you simulate. If you simulate from the estimates from the FOCEi transformation above it will always be positive definite, though. If you simulate from NONMEM standard errors it can be non-positive definite non-invertable.

Approach 2 -- Simulating with the standard errors of the variance components and the LKJ distribution for the covariance pieces.

This is the approach that stan advocates.

In this approach a multivariate normal is used to simulate the fixed effect variance components that are estimated. Once the variance components are simulated, the correlation is simulated by the LKJ distribution and the covariance is constructed from the caluclated pieces.

Advantages

Uses the parameter uncertainty as measured by the estimated standard errors for fixed effects and random variance components
Uses the THETA/diag(Omega)/Sigma "interactions" when simulating the parameters
Does not make the assumption the correlations are zero inherently, but uses a tuning parameter which unfortunately for clarity is named eta; I could name it for clarity etaLKJ
LKJ prior simulation is more efficient than the inverse wishart prior simulation.

Disadvantages

The prior cannot include information about the estimates of covariance parameters. Hence if any correlations are modeled, this approach may not be accurate
The estimate of etaLKJ is not specified or calculated by the model or literature; the modeler would have to choose a value from 1 (approx uniform from -1 to 1 for the correlations) to Inf (correlations are zero), which is arbitrary.
In high dimensions the correlations are often near zero
Only works with covariances, so simple variances cannot be used (default to scaled inverse chi squared distribution)

Some of these objections can be overcome by using the inverse wishart to simulate the correlation matrix instead of the LKJ distribution. Then:

The degrees of freedom is known
The high dimension near-zero correlations are not present.

But it is slower than the LKJ distribution.

Inverse Wishart

This is the approach that RxODE currently uses.

Advantages:

Already implemented
Allows correlation structure to be specified by estimated covariance matrix
Allows off-diagonal simulations for covariances that may exist but were not observed in the data
The degrees of freedom can be easily calculated from the data estimated and the covariance matrix being estimated.
Disadvantages:*
Variability is specified only by one parameter: the degrees of freedom
Low variances and high correlations can bias the results

Other approaches for prior specification of covariance matrix

These include (and are discussed here)

Scaled inverse wishart
Hierarchical half t distribution

mattfidler commented 5 years ago

If I implement standard errors on the covariance matrix, then a simulation strategy could be:

When only diagonal matrices are specified:
- Simulation with standard errors will be implemented with multivariate normal transformed variances
- Correlation/Covariance will be completed with the LJK or simulated inverse wishart correlation matrix
When correlations are modeled:
- Simulate with the inverse Wishart
When there is only one variance
- Use scaled inverse chi squared

However it is simpler to explain and simply use inverse wishart, so perhaps that could also be an option for simulation.

mattfidler commented 5 years ago

One other note, the etaLJK parameter can be related to degrees of freedom see: https://arxiv.org/pdf/1809.04746.pdf

alpha_{d-1} = (nu - d +1)/2 = etaLJK + (d -2)/2
etaLJK = (nu-d+1)/2 - (d-2)/2 = (nu-1)/2

Which also implies you need at least nu=3 degrees of freedom.

billdenney commented 5 years ago

WOW! Thank you for the details here. They are eye-opening (and very well described).

I can't claim to have an understanding of what the best method is, but your summary gives a lot of thought to items to consider. I do maintain the desire to simulate with parameter uncertainty, but I now feel like we as a discipline are often doing that wrong (when we do it) for the omega matrix.

mattfidler commented 5 years ago

I agree; before working on this nlmixr project I never considered the omega matrix simulation in this detail. @wwang-at-github is the one who told me about the inverse wishart in the first place.

Overall I think the inverse wishart is the most flexible since it can handle modeled covariance terms and is the default for RxODE simulations.

If that is a good way to simulate covariances, the SE terms become unneeded parameters (except for possibly diagnosing model problems). If you use SEs in prior distribution via one of the above methods, SEs on covariance terms may not be needed (although because of the parameterization, the covariance terms could affect the variance terms so I have to think carefully about excluding them...)

mattfidler commented 5 years ago

The following publication seems to support when covariances are modeled it is reasonable to use an inverse wishart, though the separation strategy is preferred otherwise

https://files.eric.ed.gov/fulltext/ED588080.pdf

mattfidler commented 5 years ago

This publication shows a way to simulate a full covariance prior but would require a mixture model through mcmc fitting, so I think it is too much:

https://academic.oup.com/biomet/article/91/1/1/218709

mattfidler commented 5 years ago

Here is my current thinking:

I am still not sure that standard errors on covariance parameters make too much sense.
- They only make sense when there is sufficient information that the distribution approaches a multi-variate normal (however many that takes). I think I have even less confidence on off-diagonal correlations.
The inverse wishart falls apart when you have either very low or very high variance components. In that case you should reject your model anyway...
nlme and saem methods of calculating covarinace do not calculate standard error on covariance parameters
focei can be forced to calculate the standard errors, but since they may not be multivariate normal this could be one of the biggest reasons why a covariance step fails. I'm not sure I want to add that instability in the covariance matrix step
- It still doesn't remove the case where correlations may diagnose model issues, but those also exist in SAEM and nlme and I cannot add it with the current methodologies
It complicates thinking to switch between focei which has standard errors on covariance parameters, and using inverse wishart on all other approaches
It also complicates thinking to switch between the inverse wishart and other methods of deriving priors on covariance parameters. I would have to cite and explain why certain methodologies are performed.

With all that in mind, I am thinking not adding SEs on estimated covariance/standard deviation components for now.

However, the simulation methodologies now exist in the development branch of RxODE.

billdenney commented 5 years ago

That makes perfectly good sense, and thank you for the thoughtful and detailed discussion.

nlmixrdevelopment / nlmixr