log_determinant speedup

spinkney commented 2 years ago

The current implementation of log_determinant uses colPivHouseholderQr(). This gist shows how to compute the log determinant in a numerically stable way using the Cholesky or LU decomposition. According to Eigen's own analysis these would be about 3x faster for small matrices, with larger speed-ups for larger matrices.

bob-carpenter commented 2 years ago

Thanks, @spinkey. That looks like a big saving. l'm OK with this if our linear algebra specialists are OK with this (@bgoodri or Philip Greengard (can't find his GitHub handle).

It looks like it'd also be worth having a symmetric specialization which would have a spec like this:

/**
 * Return the natural logarithm of the determinant of the symmetrized version of the specified matrix.
 * Only the lower triangular portions of `v` will be used and the above-diagonal assumed to be symmetric.
*/
real log_det_sym(matrix v);

An alternative would be to do a test for symmetry and throw an exception in the asymmetric case.

We'd want to benchmark ourselves, as we're using double-precision and don't turn on AVX in our default configs. Eigen's page says the following.

This benchmark has been run on a laptop equipped with an Intel core i7 @ 2,6 GHz, and compiled with clang with AVX and FMA instruction sets enabled but without multi-threading. It uses single precision float numbers. For double, you can get a good estimate by multiplying the timings by a factor 2.

I'm not sure we turn on AVX instructions by default in our makefiles.

bgoodri commented 2 years ago

We already have a specialization for symmetric positive definite matrices

https://github.com/stan-dev/math/blob/develop/stan/math/rev/fun/log_determinant_spd.hpp

I don't see much use for a specialization that is symmetric but not positive definite, because you would usually be taking the logarithm of a negative determinant.

So, I don't know why anyone would be calling log_determinant on an asymmetric matrix in Stan. I think the QR approach and the LU approach have about the same costs, depending on how seriously you pivot to avoid low relative accuracy for near-zero numbers.

bob-carpenter commented 2 years ago

@bgoodri: Thanks for pointing out the specialization. It's coded with LDLT rather than LLT, but LLT appears to be about 5 times as fast once you get to 1000 x 1000 matrices. Do we need LDLT for stability?

bgoodri commented 2 years ago

There's a tradeoff, but many years ago we decided to go with LDLT when possible on the basis of Eigen (and others) saying here that the reliability and accuracy of LDLT was very good, while for LLT, it depends on the condition number of the matrix. In addition, although related, if you are doing anything in Stan with a 1000x1000 SPD matrix, you should be parameterizing things in terms of a Cholesky factor if at all possible, because even LLT is slow if you have to do it up to 1023 times per MCMC iteration.

bob-carpenter commented 2 years ago

Thanks for the recap, @bgoodri. That's consistent with my vague memory of the discussion when it went in.

For estimating covariance, we definitely want to use Cholesky factors. But don't we need to factor a large matrix like this when they arise from a GP covariance function?

spinkney commented 2 years ago

close via #2718

stan-dev / math

log_determinant speedup #2679