Closed zcaudate closed 6 years ago
Sorry for the delay here. I'm not entirely sure where this difference might come from, but would be curious about more details of the benchmark. Would it be possible to share some code (at least the JOCL-based part?) Or is the issue resolved now?
@gpu: this issue is related to workgroup size and is being looked at here: https://github.com/gpu/JOCL/issues/21
The cv::ocl version had a local worksize of [8 8] and jocl was [1 1].
I'm curious to know if there are any benchmarks for operations in jocl as compared to using c++.
I've done my own benchmarking with the opencv ocl library and jocl on a custom image algorithm and found that there is an order of an magnitude difference between jocl and native ocl:
jocl, 0250px: 2.467 ms jocl, 0500px: 6.367 ms jocl, 1000px: 24.060 ms jocl, 2000px: 81.356 ms jocl, 4000px: 287.928 ms jocl, 8000px: 1031.693 ms
cv::ocl, 0250px: 0.475 ms cv::ocl, 0500px: 0.783 ms cv::ocl, 1000px: 1.632 ms cv::ocl, 2000px: 4.555 ms cv::ocl, 4000px: 15.846 ms cv::ocl, 8000px: 121.899 ms
Is there any reason for this?