stan-dev / math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
737 stars 185 forks source link

Speedup symmetric eigendecomposition #1156

Open t4c1 opened 5 years ago

t4c1 commented 5 years ago

Description

Symmetric eigendecomposition from library Eigen is implemented in a suboptimal way. It consists of two steps: tridiagonalization and tridiagonal eigensolver.

In Eigen, basic Householder tridiagonalization is implemented. This can be improved on using blocked Householder algorithm, which uses cache more efficiently and is therefore faster.

For symmetric eigensolver Eigen uses QR iteration. While very accurate, it is quite slow. For square symmetric matrix of size n it does O(n^3) operations. It can be replaced by, for example, the MRRR algorithm that is significantly faster - O(n^2) and usually accurate enough. However, for cases where multiple eigenvalues are very close to each other - with relative difference (l1-l2)/l1 between them close to machine precision - MRRR might produce more significant errors or in extreme cases even fail to converge.

Both steps can be further improved using GPUs.

Example

Function is added that computes symmetric eigendecomposition.

Expected Output

Symmetric eigendecomposition is faster than one implemented in Eigen.

Current Version:

v2.18.1

SteveBronder commented 5 years ago

Nice!! Can you post some speed and precision tests? I did a little one here that may be helpful tos how what I would like to see

https://gist.github.com/SteveBronder/912009ac53b676c89be030cf5dd62bd2#file-gpu_sensitivity_test-cpp

Sean also has some stuff here that could be useful

https://github.com/stan-dev/performance-tests-cmdstan

https://github.com/seantalts/perf-math

t4c1 commented 5 years ago

I made something similar to your test: https://gist.github.com/t4c1/b8112f4bb9b3d6261f29cb578fa20090

Once I open pull request for GPU part I will also add it to the test.

wds15 commented 5 years ago

Could u please plot that table? I am bad in mentally plotting it...

...this looks quite nice... ok... looking closer there seems to be 2x speedup here. wonderful.

t4c1 commented 5 years ago

Here are the graphs for times and errors: CPU speedup CPU_errors

SteveBronder commented 5 years ago

Nice! bstatcomp:cpu_eigendecomposition is the branch I should be on to test the above?

rok-cesnovar commented 5 years ago

Yep: https://github.com/bstatcomp/math/tree/cpu_eigendecomposition

There is also a variant for OpenCL where I think Tadej got around a 5-6x speedup compared to this branch on a GTX1070.