Open bbbales2 opened 4 years ago
I doubt either function gets called very often, but it would be good to improve them. This is one of the functions that
https://gitlab.stce.rwth-aachen.de/stce/eigen-ad https://arxiv.org/abs/1911.12604
improved upon, so we should probably do like they did as much as possible.
On Thu, Oct 1, 2020 at 3:52 PM Ben Bales notifications@github.com wrote:
Description
I was benchmarking determinant and log_determinant today.
For reverse mode for each of these functions you need to evaluate the determinant, and to get the gradients you need the inverse (it's actually the inverse -- no solves to be done -- here's a link to the math https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf).
For the current implementation of determinant(m), we use m.determinant() and m.inverse() separately in the Eigen code.
For the current implementation of log determinant(m), we do m_piv = m.fullPivHouseholderQr() first and then m_piv.determinant() and m_piv.inverse().
The second is slower than the first I think because fullPivHouseholderQr is so conservative.
partialPivotLU however is faster than both methods. The limitation in the Eigen docs is that we only use it if the inverse exists.
Anyone see any problem with switching to partialPivotLU for both since if the inverse doesn't exist our reverse mode autodiff fails anyway?
@bgoodri https://github.com/bgoodri Current Version:
v3.3.0
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/2120, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKWT3WNVZ7343O75OWLSITMXLANCNFSM4SA3VTMA .
I didn't find any discussion of the numerics. They use FullPivHouseholderQR in the paper for their determinant calculations though.
I'm still inclined to replace these with the partialPivLus, but I don't know anything about this stuff other than what is in the Eigen accuracy/speed table says (https://eigen.tuxfamily.org/dox/group__TutorialLinearAlgebra.html).
I think the idea is that Stan shouldn't do one decomposition to calculate the determinant and another decomposition to calculate the inverse when both can be accomplished with one decomposition. My impression is that software tends to use QR decompositions, rather than LU decompositions, to calculate determinants, but I agree that FullPivHouseholderQR is probably more conservative than necessary for Stan.
Oh I can just do HouseHolderQR then. It has the same number of +s as the partialPivLu, but doesn't require invertability. Also the Eigen-AD people did do the autodiff for that one.
And yeah, then we can avoid separate decompositions in the current determinant implementation.
Ack, HouseHolderQR doesn't have an inverse function.
@SteveBronder if I have MatrixXd mat
, can you tell what the default Eigen algorithm is for computing the determinant (mat.determinant()
) and the inverse (mat.inverse()
)? I assume it's one of the ones in the table here: https://eigen.tuxfamily.org/dox/group__TutorialLinearAlgebra.html, but I don't know which.
Just searching around I get to here: https://eigen.tuxfamily.org/dox/classEigen_1_1MatrixBase.html#a7ad8f77004bb956b603bb43fd2e3c061, but it's not obvious what is happening in MatrixBase.h: https://eigen.tuxfamily.org/dox/MatrixBase_8h_source.html
You just do a triangular solve with R and the transpose of the Householder vectors. Or you could use ColPivHouseholderQR.
Yeah, but I'm worried it's not implemented because it's a particularly bad idea or something. Also all the householder QR things only return log(abs(determinant)) and the determinant can be negative. I wanna hear what function is currently being used now. My suspicion based on the speed is that it's already doing a partialPivLu
internally.
The raw determinant is the product of the diagonal elements of R, which can be zero or too close to zero for the inverse to be calculated with sufficient precision. But the user is willing to assume that the determinant is far from zero, then householderQR is OK, although column pivoting would be safer. The .determinant() method does LU
http://eigen.tuxfamily.org/dox/classEigen_1_1MatrixBase.html#a7ad8f77004bb956b603bb43fd2e3c061
but then you have to do two triangular solves to get the inverse.
On Sun, Oct 4, 2020 at 4:33 PM Ben Bales notifications@github.com wrote:
Yeah, but I'm worried it's not implemented because it's a particularly bad idea or something. Also all the householder QR things only return log(abs(determinant)) and the determinant can be negative. I wanna hear what function is currently being used now. My suspicion based on the speed is that it's already doing a partialPivLu internally.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/2120#issuecomment-703312120, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKTLWWEMOUBGWDJGVR3SJDLYRANCNFSM4SA3VTMA .
Oh duh. It says defined in the LU package. I was looking at that earlier but guess I wasn't thinking.
Okay, I will run some benchmarks and try these different things.
Here are benchmarks. Only pay attention to the column on the left. The number in the benchmark name is one dimension of the square matrix. Whatcha think.
partialPivLu:
---------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------
toss_me 10126842 ns 10126277 ns 69
log_determinant/2/manual_time 752 ns 928 ns 892412
log_determinant/4/manual_time 1053 ns 1409 ns 654374
log_determinant/8/manual_time 2227 ns 3227 ns 312505
log_determinant/16/manual_time 6989 ns 10995 ns 98178
log_determinant/32/manual_time 29350 ns 46624 ns 22943
log_determinant/64/manual_time 159867 ns 228716 ns 4365
log_determinant/128/manual_time 993814 ns 1270616 ns 685
log_determinant/256/manual_time 7442333 ns 8790691 ns 89
log_determinant/512/manual_time 54184923 ns 60745520 ns 11
log_determinant/1024/manual_time 379838215 ns 407375132 ns 2
colPivHouseholderQR:
---------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------
toss_me 10131038 ns 10130318 ns 70
log_determinant/2/manual_time 1238 ns 1411 ns 554675
log_determinant/4/manual_time 2094 ns 2449 ns 333062
log_determinant/8/manual_time 4709 ns 5711 ns 137462
log_determinant/16/manual_time 14199 ns 18238 ns 48932
log_determinant/32/manual_time 57807 ns 75938 ns 12124
log_determinant/64/manual_time 338877 ns 408411 ns 2069
log_determinant/128/manual_time 1779057 ns 2062218 ns 391
log_determinant/256/manual_time 12644361 ns 13849673 ns 53
log_determinant/512/manual_time 92704398 ns 99278082 ns 7
log_determinant/1024/manual_time 963444459 ns 999270366 ns 1
HouseholderQR:
---------------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------------
toss_me 10120007 ns 10112111 ns 70
log_determinant/2/manual_time 1195 ns 1369 ns 581714
log_determinant/4/manual_time 2128 ns 2486 ns 331031
log_determinant/8/manual_time 5125 ns 6169 ns 130021
log_determinant/16/manual_time 16627 ns 20833 ns 42083
log_determinant/32/manual_time 70427 ns 88308 ns 9730
log_determinant/64/manual_time 372929 ns 444293 ns 1863
log_determinant/128/manual_time 2433538 ns 2719744 ns 288
log_determinant/256/manual_time 18905200 ns 20090252 ns 37
log_determinant/512/manual_time 149071452 ns 155724286 ns 5
log_determinant/1024/manual_time 2083937550 ns 2118084887 ns 1
I don't think those benchmarks are that relevant for MCMC. You need to benchmark the combined time to calculate the value and the partial derivatives.
On Mon, Oct 5, 2020 at 1:26 PM Ben Bales notifications@github.com wrote:
Here are benchmarks. Only pay attention to the column on the left. The number in the benchmark name is one dimension of the square matrix. Whatcha think.
partialPivLu:
Benchmark Time CPU Iterations
toss_me 10126842 ns 10126277 ns 69 log_determinant/2/manual_time 752 ns 928 ns 892412 log_determinant/4/manual_time 1053 ns 1409 ns 654374 log_determinant/8/manual_time 2227 ns 3227 ns 312505 log_determinant/16/manual_time 6989 ns 10995 ns 98178 log_determinant/32/manual_time 29350 ns 46624 ns 22943 log_determinant/64/manual_time 159867 ns 228716 ns 4365 log_determinant/128/manual_time 993814 ns 1270616 ns 685 log_determinant/256/manual_time 7442333 ns 8790691 ns 89 log_determinant/512/manual_time 54184923 ns 60745520 ns 11 log_determinant/1024/manual_time 379838215 ns 407375132 ns 2
colPivHouseholderQR:
Benchmark Time CPU Iterations
toss_me 10131038 ns 10130318 ns 70 log_determinant/2/manual_time 1238 ns 1411 ns 554675 log_determinant/4/manual_time 2094 ns 2449 ns 333062 log_determinant/8/manual_time 4709 ns 5711 ns 137462 log_determinant/16/manual_time 14199 ns 18238 ns 48932 log_determinant/32/manual_time 57807 ns 75938 ns 12124 log_determinant/64/manual_time 338877 ns 408411 ns 2069 log_determinant/128/manual_time 1779057 ns 2062218 ns 391 log_determinant/256/manual_time 12644361 ns 13849673 ns 53 log_determinant/512/manual_time 92704398 ns 99278082 ns 7 log_determinant/1024/manual_time 963444459 ns 999270366 ns 1
HouseholderQR:
Benchmark Time CPU Iterations
toss_me 10120007 ns 10112111 ns 70 log_determinant/2/manual_time 1195 ns 1369 ns 581714 log_determinant/4/manual_time 2128 ns 2486 ns 331031 log_determinant/8/manual_time 5125 ns 6169 ns 130021 log_determinant/16/manual_time 16627 ns 20833 ns 42083 log_determinant/32/manual_time 70427 ns 88308 ns 9730 log_determinant/64/manual_time 372929 ns 444293 ns 1863 log_determinant/128/manual_time 2433538 ns 2719744 ns 288 log_determinant/256/manual_time 18905200 ns 20090252 ns 37 log_determinant/512/manual_time 149071452 ns 155724286 ns 5 log_determinant/1024/manual_time 2083937550 ns 2118084887 ns 1
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/2120#issuecomment-703774764, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKTXO5BAQEJHZGRERPTSJH6UPANCNFSM4SA3VTMA .
I suspect determinant
has already been using a partialPivLu. Though it was doing the determinant and inverse separately, it was nearly as fast as the partialPivLu thing.
We could go to both doing partialPivLu
, or we could go to partialPivLu
on determinant
and colPivHouseholderQR
on log_determinant
and just add a note in the dots that log_determinant
will be a bit more stable. Or I guess by that fast/slow logic we'd just stick with full pivoting for log_determinant
.
Sorry for my delay, calling matrix.deteminant()
calls matrix.partialPivLu().determinant()
https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/LU/Determinant.h#L26
The Derived
type there will be the matrix type, which is then called at the bottom like
template<typename Derived>
inline typename internal::traits<Derived>::Scalar MatrixBase<Derived>::determinant() const
{
eigen_assert(rows() == cols());
typedef typename internal::nested_eval<Derived,Base::RowsAtCompileTime>::type Nested;
return internal::determinant_impl<typename internal::remove_all<Nested>::type>::run(derived());
}
Nice thanks!
@bgoodri alright I just switched determinant
to partialPivLu and log_determinant
can stay with fullPivHouseholderQr
. I'll add a note to the docs that determinant
is faster and uses a partial pivoting LU and log_determinant
is more stable and uses a fully pivoted householder QR, or whatever the nomenclature is for that.
I think using colPivHouseholderQR in log_determinant() would be fine too. In the region where fullPivHouseholderQR is more accurate than colPivHouseholderQR, I suspect the gradient is not accurate in either case.
On Mon, Oct 5, 2020 at 5:44 PM Ben Bales notifications@github.com wrote:
Nice thanks!
@bgoodri https://github.com/bgoodri alright I just switched determinant to partialPivLu and log_determinant can stay with fullPivHouseholderQr. I'll add a note to the docs that determinant is faster and uses a partial pivoting LU and log_determinant is more stable and uses a fully pivoted householder QR, or whatever the nomenclature is for that.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/math/issues/2120#issuecomment-703905956, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ2XKUWU5MMMURWSU3YHOLSJI445ANCNFSM4SA3VTMA .
Cool! I swapped log_determinant
to colPivHouseholderQr.
Description
I was benchmarking determinant and log_determinant today.
For reverse mode for each of these functions you need to evaluate the determinant, and to get the gradients you need the inverse (it's actually the inverse -- no solves to be done -- here's a link to the math).
For the current implementation of determinant(m), we use
m.determinant()
andm.inverse()
separately in the Eigen code.For the current implementation of log determinant(m), we do
m_piv = m.fullPivHouseholderQr()
first and thenm_piv.determinant()
andm_piv.inverse()
.The second is slower than the first I think because
fullPivHouseholderQr
is so conservative.partialPivotLU
however is faster than both methods. The limitation in the Eigen docs is it's only stable it if the inverse exists (https://eigen.tuxfamily.org/dox/classEigen_1_1PartialPivLU.html).Anyone see any problem with switching to
partialPivotLU
for both since if the inverse doesn't exist our reverse mode autodiff fails anyway?@bgoodri
Current Version:
v3.3.0