opencv / opencv

Open Source Computer Vision Library
https://opencv.org
Apache License 2.0
78.72k stars 55.79k forks source link

OpenCV 3.1's SVD is 3600 times slower than OpenCV 2.2's SVD #10084

Open miyazakishogun opened 6 years ago

miyazakishogun commented 6 years ago
Detailed description

As it is already reported previously, SVD of version 2.3 and later is slower than SVD of version 2.2. https://github.com/opencv/opencv/issues/4313 https://github.com/opencv/opencv/issues/7563 https://github.com/opencv/opencv/issues/7917

Below, I will report the benchmark result.

OpenCV 2.2 use LAPACK, so its SVD was fast, but OpenCV 2.3 use own implementation, and so its SVD is slow. OpenCV's SVD is implemented in lapack.cpp's JacobiSVDImpl_. I looked at this code, but I could not understand. I just guess that the speed of this function becomes slow when there are zero singular values. In order to check my hypothesis, two benchmark tests are done below: One with the matrix filled with the value "one," and the other with the matrix filled with random numbers.

System information (version)
Steps to reproduce
const int ROW = 2000;
const int COL = 2000;
Mat mat = Mat::ones(ROW, COL, CV_64F);
int64 start = getTickCount();
cout << "start = " << start << endl;
SVD svd(mat);
int64 end = getTickCount();
cout << "end = " << end << endl;
int calctime = (int)((end - start) * 1000 / getTickFrequency());
cout << "duration = " << calctime << " [msec]" << endl << endl;
cout << "W = " << endl << svd.w.rowRange(Range(0, 9)) << endl;

For random matrix, following code is used for mat.

RNG gen(getTickCount());
Mat mat(ROW, COL, CV_64F);
gen.fill(mat, RNG::UNIFORM, 0.0, 1.0);
Benchmark result: random
Library Millisecond
MATLAB 3737
Python numpy 3878
C/C++ OpenCV 2.2 27523
Python OpenCV 3.3 147870
C/C++ OpenCV 3.1 187424
C/C++ NRC 3rd edit. 437539
C/C++ Eigen 940722
Benchmark result: ones
Library Millisecond
C/C++ Eigen 170
C/C++ OpenCV 2.2 2501
MATLAB 18784
Python numpy 21017
C/C++ NRC 3rd edit. 542057
Python OpenCV 3.3 9003935
C/C++ OpenCV 3.1 9004074
LaurentBerger commented 6 years ago

Benchmark result: ones (i7-5820K 3.3GHz)

start = 142696635334
end = 142833912694
duration = 42622 [msec]

W =
[1999.999999999994;
 1.47990053367204e-10;
 1.36305864866511e-10;
 1.112592094516811e-10;
 8.043661646031361e-11;
 5.973210480786699e-11;
 5.917453275296846e-11;
 4.796063704383165e-11;
 4.076050587013446e-11]

Benchmark result: random

start = 143137643287
end = 143152954830
duration = 4753 [msec]

W =
[1000.433115095811;
 25.76831496221166;
 25.68742797478228;
 25.58668872746218;
 25.5059002844232;
 25.47444113584281;
 25.33021865693487;
 25.28270260936562;
 25.26220877199353]
aks2 commented 6 years ago

As it depends on the numbers, it could be handling of denormal-numbers. Some numbers are handled in software interupts (slow, factor >100) if this CPU-feature is not explicitly disabled. This could be the issue here (not sure, but likely). Ways of solution are described here: https://en.wikipedia.org/wiki/Denormal_number#Disabling_denormal_floats_at_the_code_level

miyazakishogun commented 6 years ago

aks2, thanks for the information about denormal numbers.

System information (version)
Benchmark result: random
Library Millisecond
C/C++ Eigen BDC 12652
C/C++ OpenCV 2.2 48290
C/C++ OpenCV 3.1 166341
C/C++ NRC 3rd ed. 375849
C/C++ Eigen Jacobi 920501
Benchmark result: ones
Library Millisecond
C/C++ Eigen Jacobi 195
C/C++ OpenCV 2.2 2230
C/C++ NRC 3rd ed. 6026
C/C++ OpenCV 3.1 138215
C/C++ Eigen BDC error
aks2 commented 6 years ago

another possible imporvement for JacobiSVDImpl_() if( std::abs(p) <= epsstd::sqrt((double)ab) ) // sqrt() is slow if( (pp) <= (epseps)((double)ab) ) // without slow sqrt please give this a try. I dont know how often this is executed, but it is inside 3 for loops and should give some win.

PS: I already opened another common issue (10075), because OpenCV is full of not/bad optimized code.

aks2 commented 6 years ago

ups: the mul sign is gone above, so pp means p x p and epseps=eps x eps, and ab=a x b .

aks2 commented 6 years ago

if( std::abs(p) <= eps*std::sqrt((double)a*b) ) -> if( (p * p) <= (eps * eps)*((double)a*b) )

alalek commented 6 years ago

@aks2 You should take a look on algorithm complexity at first, because

"Premature optimization is the root of all evil"

Also don't try to optimize without performance tests =)

@miyazakishogun Related commit with LAPACK backend replacement: 9ac3a35175f1536a3e6669e8adbb0f16b2236bfb Currently OpenCV build system allows to use LAPACK functions (optional). So these LAPACK calls can be returned back.

aks2 commented 6 years ago

Of course this needs testing, but for this exit-condition, I cannot see any disadvantage. // I did alot number crunching with float numbers ..

cbalint13 commented 6 years ago

Folks,

I believe for OpenCV would be wiser to incorporate / link against external OpenBLAS / Lapack (perhaps user can choose flavor with a simple cmake switch).

hrnr commented 6 years ago

OpenCV can use external BLAS/LAPAC. It is possible to configure it in CMake.