stan-dev / math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
752 stars 188 forks source link

determinant algorithm #2120

Open bbbales2 opened 4 years ago

bbbales2 commented 4 years ago

Description

I was benchmarking determinant and log_determinant today.

For reverse mode for each of these functions you need to evaluate the determinant, and to get the gradients you need the inverse (it's actually the inverse -- no solves to be done -- here's a link to the math).

For the current implementation of determinant(m), we use m.determinant() and m.inverse() separately in the Eigen code.

For the current implementation of log determinant(m), we do m_piv = m.fullPivHouseholderQr() first and then m_piv.determinant() and m_piv.inverse().

The second is slower than the first I think because fullPivHouseholderQr is so conservative.

partialPivotLU however is faster than both methods. The limitation in the Eigen docs is it's only stable it if the inverse exists (https://eigen.tuxfamily.org/dox/classEigen_1_1PartialPivLU.html).

Anyone see any problem with switching to partialPivotLU for both since if the inverse doesn't exist our reverse mode autodiff fails anyway?

@bgoodri

Current Version:

v3.3.0

bgoodri commented 4 years ago

I doubt either function gets called very often, but it would be good to improve them. This is one of the functions that

https://gitlab.stce.rwth-aachen.de/stce/eigen-ad https://arxiv.org/abs/1911.12604

improved upon, so we should probably do like they did as much as possible.

On Thu, Oct 1, 2020 at 3:52 PM Ben Bales notifications@github.com wrote:

Description

I was benchmarking determinant and log_determinant today.

For reverse mode for each of these functions you need to evaluate the determinant, and to get the gradients you need the inverse (it's actually the inverse -- no solves to be done -- here's a link to the math https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf).

For the current implementation of determinant(m), we use m.determinant() and m.inverse() separately in the Eigen code.

For the current implementation of log determinant(m), we do m_piv = m.fullPivHouseholderQr() first and then m_piv.determinant() and m_piv.inverse().

The second is slower than the first I think because fullPivHouseholderQr is so conservative.

partialPivotLU however is faster than both methods. The limitation in the Eigen docs is that we only use it if the inverse exists.

Anyone see any problem with switching to partialPivotLU for both since if the inverse doesn't exist our reverse mode autodiff fails anyway?

@bgoodri https://github.com/bgoodri Current Version:

v3.3.0

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/2120, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKWT3WNVZ7343O75OWLSITMXLANCNFSM4SA3VTMA .

bbbales2 commented 4 years ago

I didn't find any discussion of the numerics. They use FullPivHouseholderQR in the paper for their determinant calculations though.

I'm still inclined to replace these with the partialPivLus, but I don't know anything about this stuff other than what is in the Eigen accuracy/speed table says (https://eigen.tuxfamily.org/dox/group__TutorialLinearAlgebra.html).

bgoodri commented 4 years ago

I think the idea is that Stan shouldn't do one decomposition to calculate the determinant and another decomposition to calculate the inverse when both can be accomplished with one decomposition. My impression is that software tends to use QR decompositions, rather than LU decompositions, to calculate determinants, but I agree that FullPivHouseholderQR is probably more conservative than necessary for Stan.

bbbales2 commented 4 years ago

Oh I can just do HouseHolderQR then. It has the same number of +s as the partialPivLu, but doesn't require invertability. Also the Eigen-AD people did do the autodiff for that one.

And yeah, then we can avoid separate decompositions in the current determinant implementation.

bbbales2 commented 4 years ago

Ack, HouseHolderQR doesn't have an inverse function.

@SteveBronder if I have MatrixXd mat, can you tell what the default Eigen algorithm is for computing the determinant (mat.determinant()) and the inverse (mat.inverse())? I assume it's one of the ones in the table here: https://eigen.tuxfamily.org/dox/group__TutorialLinearAlgebra.html, but I don't know which.

Just searching around I get to here: https://eigen.tuxfamily.org/dox/classEigen_1_1MatrixBase.html#a7ad8f77004bb956b603bb43fd2e3c061, but it's not obvious what is happening in MatrixBase.h: https://eigen.tuxfamily.org/dox/MatrixBase_8h_source.html

bgoodri commented 4 years ago

You just do a triangular solve with R and the transpose of the Householder vectors. Or you could use ColPivHouseholderQR.

bbbales2 commented 4 years ago

Yeah, but I'm worried it's not implemented because it's a particularly bad idea or something. Also all the householder QR things only return log(abs(determinant)) and the determinant can be negative. I wanna hear what function is currently being used now. My suspicion based on the speed is that it's already doing a partialPivLu internally.

bgoodri commented 4 years ago

The raw determinant is the product of the diagonal elements of R, which can be zero or too close to zero for the inverse to be calculated with sufficient precision. But the user is willing to assume that the determinant is far from zero, then householderQR is OK, although column pivoting would be safer. The .determinant() method does LU

http://eigen.tuxfamily.org/dox/classEigen_1_1MatrixBase.html#a7ad8f77004bb956b603bb43fd2e3c061

but then you have to do two triangular solves to get the inverse.

On Sun, Oct 4, 2020 at 4:33 PM Ben Bales notifications@github.com wrote:

Yeah, but I'm worried it's not implemented because it's a particularly bad idea or something. Also all the householder QR things only return log(abs(determinant)) and the determinant can be negative. I wanna hear what function is currently being used now. My suspicion based on the speed is that it's already doing a partialPivLu internally.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/2120#issuecomment-703312120, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKTLWWEMOUBGWDJGVR3SJDLYRANCNFSM4SA3VTMA .

bbbales2 commented 4 years ago

Oh duh. It says defined in the LU package. I was looking at that earlier but guess I wasn't thinking.

Okay, I will run some benchmarks and try these different things.

bbbales2 commented 4 years ago

Here are benchmarks. Only pay attention to the column on the left. The number in the benchmark name is one dimension of the square matrix. Whatcha think.

partialPivLu:

---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
toss_me                            10126842 ns     10126277 ns           69
log_determinant/2/manual_time           752 ns          928 ns       892412
log_determinant/4/manual_time          1053 ns         1409 ns       654374
log_determinant/8/manual_time          2227 ns         3227 ns       312505
log_determinant/16/manual_time         6989 ns        10995 ns        98178
log_determinant/32/manual_time        29350 ns        46624 ns        22943
log_determinant/64/manual_time       159867 ns       228716 ns         4365
log_determinant/128/manual_time      993814 ns      1270616 ns          685
log_determinant/256/manual_time     7442333 ns      8790691 ns           89
log_determinant/512/manual_time    54184923 ns     60745520 ns           11
log_determinant/1024/manual_time  379838215 ns    407375132 ns            2

colPivHouseholderQR:

---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
toss_me                            10131038 ns     10130318 ns           70
log_determinant/2/manual_time          1238 ns         1411 ns       554675
log_determinant/4/manual_time          2094 ns         2449 ns       333062
log_determinant/8/manual_time          4709 ns         5711 ns       137462
log_determinant/16/manual_time        14199 ns        18238 ns        48932
log_determinant/32/manual_time        57807 ns        75938 ns        12124
log_determinant/64/manual_time       338877 ns       408411 ns         2069
log_determinant/128/manual_time     1779057 ns      2062218 ns          391
log_determinant/256/manual_time    12644361 ns     13849673 ns           53
log_determinant/512/manual_time    92704398 ns     99278082 ns            7
log_determinant/1024/manual_time  963444459 ns    999270366 ns            1

HouseholderQR:

---------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations
---------------------------------------------------------------------------
toss_me                            10120007 ns     10112111 ns           70
log_determinant/2/manual_time          1195 ns         1369 ns       581714
log_determinant/4/manual_time          2128 ns         2486 ns       331031
log_determinant/8/manual_time          5125 ns         6169 ns       130021
log_determinant/16/manual_time        16627 ns        20833 ns        42083
log_determinant/32/manual_time        70427 ns        88308 ns         9730
log_determinant/64/manual_time       372929 ns       444293 ns         1863
log_determinant/128/manual_time     2433538 ns      2719744 ns          288
log_determinant/256/manual_time    18905200 ns     20090252 ns           37
log_determinant/512/manual_time   149071452 ns    155724286 ns            5
log_determinant/1024/manual_time 2083937550 ns   2118084887 ns            1
bgoodri commented 4 years ago

I don't think those benchmarks are that relevant for MCMC. You need to benchmark the combined time to calculate the value and the partial derivatives.

On Mon, Oct 5, 2020 at 1:26 PM Ben Bales notifications@github.com wrote:

Here are benchmarks. Only pay attention to the column on the left. The number in the benchmark name is one dimension of the square matrix. Whatcha think.

partialPivLu:


Benchmark Time CPU Iterations

toss_me 10126842 ns 10126277 ns 69 log_determinant/2/manual_time 752 ns 928 ns 892412 log_determinant/4/manual_time 1053 ns 1409 ns 654374 log_determinant/8/manual_time 2227 ns 3227 ns 312505 log_determinant/16/manual_time 6989 ns 10995 ns 98178 log_determinant/32/manual_time 29350 ns 46624 ns 22943 log_determinant/64/manual_time 159867 ns 228716 ns 4365 log_determinant/128/manual_time 993814 ns 1270616 ns 685 log_determinant/256/manual_time 7442333 ns 8790691 ns 89 log_determinant/512/manual_time 54184923 ns 60745520 ns 11 log_determinant/1024/manual_time 379838215 ns 407375132 ns 2

colPivHouseholderQR:


Benchmark Time CPU Iterations

toss_me 10131038 ns 10130318 ns 70 log_determinant/2/manual_time 1238 ns 1411 ns 554675 log_determinant/4/manual_time 2094 ns 2449 ns 333062 log_determinant/8/manual_time 4709 ns 5711 ns 137462 log_determinant/16/manual_time 14199 ns 18238 ns 48932 log_determinant/32/manual_time 57807 ns 75938 ns 12124 log_determinant/64/manual_time 338877 ns 408411 ns 2069 log_determinant/128/manual_time 1779057 ns 2062218 ns 391 log_determinant/256/manual_time 12644361 ns 13849673 ns 53 log_determinant/512/manual_time 92704398 ns 99278082 ns 7 log_determinant/1024/manual_time 963444459 ns 999270366 ns 1

HouseholderQR:


Benchmark Time CPU Iterations

toss_me 10120007 ns 10112111 ns 70 log_determinant/2/manual_time 1195 ns 1369 ns 581714 log_determinant/4/manual_time 2128 ns 2486 ns 331031 log_determinant/8/manual_time 5125 ns 6169 ns 130021 log_determinant/16/manual_time 16627 ns 20833 ns 42083 log_determinant/32/manual_time 70427 ns 88308 ns 9730 log_determinant/64/manual_time 372929 ns 444293 ns 1863 log_determinant/128/manual_time 2433538 ns 2719744 ns 288 log_determinant/256/manual_time 18905200 ns 20090252 ns 37 log_determinant/512/manual_time 149071452 ns 155724286 ns 5 log_determinant/1024/manual_time 2083937550 ns 2118084887 ns 1

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/2120#issuecomment-703774764, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKTXO5BAQEJHZGRERPTSJH6UPANCNFSM4SA3VTMA .

bbbales2 commented 4 years ago

I don't think those benchmarks are that relevant for MCMC

Aha! To your great surprise, these benchmarks have both.

(coming from this benchmark which hooks in here)

And the implementation is here if you want to see (the decomposition only happens once in the forward pass).

bbbales2 commented 4 years ago

I suspect determinant has already been using a partialPivLu. Though it was doing the determinant and inverse separately, it was nearly as fast as the partialPivLu thing.

We could go to both doing partialPivLu, or we could go to partialPivLu on determinant and colPivHouseholderQR on log_determinant and just add a note in the dots that log_determinant will be a bit more stable. Or I guess by that fast/slow logic we'd just stick with full pivoting for log_determinant.

SteveBronder commented 4 years ago

Sorry for my delay, calling matrix.deteminant() calls matrix.partialPivLu().determinant()

https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/LU/Determinant.h#L26

SteveBronder commented 4 years ago

The Derived type there will be the matrix type, which is then called at the bottom like

template<typename Derived>
inline typename internal::traits<Derived>::Scalar MatrixBase<Derived>::determinant() const
{
  eigen_assert(rows() == cols());
  typedef typename internal::nested_eval<Derived,Base::RowsAtCompileTime>::type Nested;
  return internal::determinant_impl<typename internal::remove_all<Nested>::type>::run(derived());
}
bbbales2 commented 4 years ago

Nice thanks!

@bgoodri alright I just switched determinant to partialPivLu and log_determinant can stay with fullPivHouseholderQr. I'll add a note to the docs that determinant is faster and uses a partial pivoting LU and log_determinant is more stable and uses a fully pivoted householder QR, or whatever the nomenclature is for that.

bgoodri commented 4 years ago

I think using colPivHouseholderQR in log_determinant() would be fine too. In the region where fullPivHouseholderQR is more accurate than colPivHouseholderQR, I suspect the gradient is not accurate in either case.

On Mon, Oct 5, 2020 at 5:44 PM Ben Bales notifications@github.com wrote:

Nice thanks!

@bgoodri https://github.com/bgoodri alright I just switched determinant to partialPivLu and log_determinant can stay with fullPivHouseholderQr. I'll add a note to the docs that determinant is faster and uses a partial pivoting LU and log_determinant is more stable and uses a fully pivoted householder QR, or whatever the nomenclature is for that.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/2120#issuecomment-703905956, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKUWU5MMMURWSU3YHOLSJI445ANCNFSM4SA3VTMA .

bbbales2 commented 4 years ago

Cool! I swapped log_determinant to colPivHouseholderQr.