Open fengyuentau opened 1 month ago
Observed problems:
@fengyuentau, from the patch I can conclude that we need only a small portion of clblast. Can we extract a subset of clblast and put it to opencv/3rdparty and link it to OpenCV? (i.e. don't use dynamic loading, which is much less convenient for end users). Also, I believe, we need to solve problems with mac and intel somehow. I remember you said (and also see it from the performance charts) that the current Intel version of gemm in OpenCV is faster than clblast, maybe we should keep Intel version.
@fengyuentau Thanks a lot for the effort! The PR was discussed on OpenCV Core team meeting and conclusion is the following:
we have troubles with the most popular platforms: Intel and Apple ARM.
I have done several testings on the clblast accuracy problem. It turns out clblast with tuning results on these platform gives incorrect results, and after reverting those tuning results it can give the correct results. See my repo for testing: https://github.com/fengyuentau/test-clblast.
@fengyuentau What is the PR status? What are the next steps here?
we have troubles with the most popular platforms: Intel and Apple ARM.
I have done several testings on the clblast accuracy problem. It turns out clblast with tuning results on these platform gives incorrect results, and after reverting those tuning results it can give the correct results. See my repo for testing: https://github.com/fengyuentau/test-clblast.
Upstream has fixed the accuracy problem both on Intel GPU and Apple M1. Performance results are updated.
@fengyuentau What is the PR status? What are the next steps here?
@asmorkalov We may need to discuss once again whether the integration should be done in the way of dynamic loading or not, since the library itself is updated quite often with tuned parameters on different platforms. It has steady APIs and if the integration is done via dynamic loading, users just need to upgrade CLBlast and do not need to re-build OpenCV.
Decided to drop dynamic loading. Will submit a new pull request to build opencv with clblast.
Second commit is all about auto-generated code.
Usage
Get CLBlast:
Test with this patch:
Performance
Usage example:
Khadas VIM4 (8GB mem, 32GB disk space) with Mali G52 r1p0
Macbook Air M1 (16GB mem, 512GB disk space)
Accuracy problem with scale >= 1280, but it is ok with scal = 1024.
PC with i7-12700K (64GB mem, 1T disk space) with Intel(R) UHD Graphics 770
Accuracy problem with complex (type CV_32FC2).
PC with GTX 1080 Ti (12GB gpu mem, CUDA 12.3)
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request