Open miyazakishogun opened 6 years ago
Benchmark result: ones (i7-5820K 3.3GHz)
start = 142696635334
end = 142833912694
duration = 42622 [msec]
W =
[1999.999999999994;
1.47990053367204e-10;
1.36305864866511e-10;
1.112592094516811e-10;
8.043661646031361e-11;
5.973210480786699e-11;
5.917453275296846e-11;
4.796063704383165e-11;
4.076050587013446e-11]
Benchmark result: random
start = 143137643287
end = 143152954830
duration = 4753 [msec]
W =
[1000.433115095811;
25.76831496221166;
25.68742797478228;
25.58668872746218;
25.5059002844232;
25.47444113584281;
25.33021865693487;
25.28270260936562;
25.26220877199353]
As it depends on the numbers, it could be handling of denormal-numbers. Some numbers are handled in software interupts (slow, factor >100) if this CPU-feature is not explicitly disabled. This could be the issue here (not sure, but likely). Ways of solution are described here: https://en.wikipedia.org/wiki/Denormal_number#Disabling_denormal_floats_at_the_code_level
aks2, thanks for the information about denormal numbers.
Library | Millisecond |
---|---|
C/C++ Eigen BDC | 12652 |
C/C++ OpenCV 2.2 | 48290 |
C/C++ OpenCV 3.1 | 166341 |
C/C++ NRC 3rd ed. | 375849 |
C/C++ Eigen Jacobi | 920501 |
Library | Millisecond |
---|---|
C/C++ Eigen Jacobi | 195 |
C/C++ OpenCV 2.2 | 2230 |
C/C++ NRC 3rd ed. | 6026 |
C/C++ OpenCV 3.1 | 138215 |
C/C++ Eigen BDC | error |
another possible imporvement for JacobiSVDImpl_() if( std::abs(p) <= epsstd::sqrt((double)ab) ) // sqrt() is slow if( (pp) <= (epseps)((double)ab) ) // without slow sqrt please give this a try. I dont know how often this is executed, but it is inside 3 for loops and should give some win.
PS: I already opened another common issue (10075), because OpenCV is full of not/bad optimized code.
ups: the mul sign is gone above, so pp means p x p and epseps=eps x eps, and ab=a x b .
if( std::abs(p) <= eps*std::sqrt((double)a*b) )
-> if( (p * p) <= (eps * eps)*((double)a*b) )
@aks2 You should take a look on algorithm complexity at first, because
"Premature optimization is the root of all evil"
Also don't try to optimize without performance tests =)
@miyazakishogun Related commit with LAPACK backend replacement: 9ac3a35175f1536a3e6669e8adbb0f16b2236bfb Currently OpenCV build system allows to use LAPACK functions (optional). So these LAPACK calls can be returned back.
Of course this needs testing, but for this exit-condition, I cannot see any disadvantage. // I did alot number crunching with float numbers ..
Folks,
I believe for OpenCV would be wiser to incorporate / link against external OpenBLAS / Lapack (perhaps user can choose flavor with a simple cmake switch).
OpenCV can use external BLAS/LAPAC. It is possible to configure it in CMake.
Detailed description
As it is already reported previously, SVD of version 2.3 and later is slower than SVD of version 2.2. https://github.com/opencv/opencv/issues/4313 https://github.com/opencv/opencv/issues/7563 https://github.com/opencv/opencv/issues/7917
Below, I will report the benchmark result.
OpenCV 2.2 use LAPACK, so its SVD was fast, but OpenCV 2.3 use own implementation, and so its SVD is slow. OpenCV's SVD is implemented in
lapack.cpp
'sJacobiSVDImpl_
. I looked at this code, but I could not understand. I just guess that the speed of this function becomes slow when there are zero singular values. In order to check my hypothesis, two benchmark tests are done below: One with the matrix filled with the value "one," and the other with the matrix filled with random numbers.System information (version)
Steps to reproduce
For random matrix, following code is used for
mat
.Benchmark result: random
Benchmark result: ones