kaskr / adcomp

AD computation with Template Model Builder (TMB)
Other
168 stars 80 forks source link

TMB implementations of LKJ, Wishart, and inverse Wishart distributions #351

Open jaganmn opened 2 years ago

jaganmn commented 2 years ago

Having dlkj, dwishart, and dinvwishart in one of the TMB namespaces would enable users to assign likelihoods to correlation or covariance matrices. I have implemented these here. Documentation and tests against independent R implementations are there, too.

Following UNSTRUCTURED_CORR_t, users would pass a vector theta or c(log_sd, theta) instead of the full correlation or covariance matrix. This simplifies computation quite a bit.

Anyway, you are welcome to use or modify the code. At your discretion, I could also work on a pull request.

kaskr commented 2 years ago

Nice work @jaganmn - thanks for sharing! I'll have it mind for future addition to TMB.

jaganmn commented 7 months ago

I wonder if this should happen soon-ish, so that glmmTMB can get the density functions directly from TMB instead of copying the code into its own sources. Probably it is not the only TMB-using package that would want dlkj, etc. @bbolker

kaskr commented 7 months ago

@jaganmn I was about to copy your code, but then got confused over a couple of details...

Given that your version of dwishart is formulated in transformed parameter space, did you remember to add the Jacobian adjustment of the transformation? A quick test of your R-code suggests perhaps not:

integrate(Vectorize(function(x)dwishart(x,df=2,scale=1)),-100,100)
## 6.838354 with absolute error < 7.9e-05
jaganmn commented 7 months ago

The d* functions in the header take as argument the transformed parameter and compute the probability density in the space of the untransformed parameter. The distribution of the transformed parameter would not be called an LKJ/Wishart/inverse Wishart distribution AFAIK.

I think that your integral approximation exceeds unity due to numerical instability for very small nonzero standard deviations. For example, in the one-dimensional case with scale = 0 (indicating an identity matrix of size one), the Wishart distribution reduces to a chi-squared distribution and so we can compare with dchisq:

set.seed(0)

x1 <- rnorm(10L)
dwishart1 <- Vectorize(function(x) dwishart(x, df = 2, scale = 0))
dchisq1 <- function(x) dchisq(exp(2 * x), df = 2)

max(abs(dwishart1(x1) - dchisq1(x1)))
## [1] 8.673617e-19
integrate(dwishart1, -100, 100)
## 50.02898 with absolute error < 0.0037
integrate(dchisq1, -100, 100)
## 50.02898 with absolute error < 0.0037

x2 <- rlnorm(10L)
dwishart2 <- Vectorize(function(x) dwishart(0.5 * log(x), df = 2, scale = 0))
dchisq2 <- function(x) dchisq(x, df = 2)

max(abs(dwishart2(x2) - dchisq2(x2)))
## [1] 5.551115e-17
integrate(dwishart2, exp(-10), exp(10))
## 0.9999773 with absolute error < 6.1e-07
integrate(dchisq2, exp(-10), exp(10))
## 0.9999773 with absolute error < 6.1e-07

One can think about ways to improve the speed/stability, e.g., constant additive terms in the log densities can typically be dropped (as in dlkj) and expressions like atomic::matinv(L_S) and invL_S.transpose() * invL_S can probably be optimized ...

kaskr commented 7 months ago

The point of my example was only to demonstrate that the density doesn't integrate to one.

jaganmn commented 7 months ago

Er, right, yes, I managed to confuse myself, not having read carefully enough (your comment and my own documentation) ...

As you say, dwishart does not integrate to one because it does not attempt to compute the Jacobian of the inverse transformation. My remark about numerical instability is nonsense. Indeed:

dwishart1.j <- Vectorize(function(x) dwishart(x, df = 2, scale = 0) * 2 * exp(2 * x))
integrate(dwishart1.j, -100, 100)
## 1 with absolute error < 3.6e-07

AFAICT the current behaviour is "correct" as long as it is clearly documented what is being computed, namely a probability density in the space of the inverse transformed parameter.

kaskr commented 7 months ago

Conventionally, all densities in TMB integrate to one, and I'm quite determined not to break that pattern.

Beyond Bayesian priors the two typical use cases are:

  1. You have Wishart data and want to estimate the parameters.
  2. You want to assign a Wishart distribution to some random effects.

I claim that your version is not directly applicable in either of these cases. First case requires you to transform the data in order to feed it into your dwishart. Same for the second case if you work in constrained parameter space. In unconstrained parameter space one would need to add the jacobian adjustment and apply a post transformation.

So, to follow the TMB convention there should probably be two versions:

// Constrained parameter space
// (MJ version with added parameter transform)
dwishart(...);
// Unconstrained parameter space.
// (MJ version with added jacobian adjustment)
dwishart_unconstrained(...);
jaganmn commented 7 months ago

OK. I'll work on those alternate implementations "soon" and update here.

bbolker commented 3 weeks ago

FWIW I continue to be interested in this ...

jaganmn commented 3 weeks ago

Not forgotten ... maybe I'll work on this during the glmmTMB hack.