Open fengyuentau opened 1 month ago
My results with Jetson tk1 (armv7+neon):
ubuntu@jetson1:~/Projects/perf-dnn$ python3 ../opencv/modules/ts/misc/summary.py ./4.x-1.xml ./patched-1.xml | grep NaryEltwise
NCHW_C_sum::Layer_NaryEltwise::OCV/CPU 65.891 43.371 1.52
NCHW_NCHW_add::Layer_NaryEltwise::OCV/CPU 79.287 81.868 0.97
NCHW_NCHW_div::Layer_NaryEltwise::OCV/CPU 187.457 187.657 1.00
NCHW_NCHW_equal::Layer_NaryEltwise::OCV/CPU 88.643 96.376 0.92
NCHW_NCHW_greater::Layer_NaryEltwise::OCV/CPU 88.694 96.035 0.92
NCHW_NCHW_less::Layer_NaryEltwise::OCV/CPU 88.716 90.298 0.98
NCHW_NCHW_max::Layer_NaryEltwise::OCV/CPU 84.722 83.976 1.01
NCHW_NCHW_mean::Layer_NaryEltwise::OCV/CPU 92.757 81.105 1.14
NCHW_NCHW_min::Layer_NaryEltwise::OCV/CPU 84.285 84.010 1.00
NCHW_NCHW_mul::Layer_NaryEltwise::OCV/CPU 78.594 78.574 1.00
NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU 3407.037 3475.724 0.98
NCHW_NCHW_ref_div::Layer_NaryEltwise::OCV/CPU 189.651 189.454 1.00
NCHW_NCHW_ref_max::Layer_NaryEltwise::OCV/CPU 87.859 87.771 1.00
NCHW_NCHW_ref_min::Layer_NaryEltwise::OCV/CPU 87.915 88.053 1.00
NCHW_NCHW_ref_mul::Layer_NaryEltwise::OCV/CPU 84.077 84.063 1.00
NCHW_NCHW_ref_sum::Layer_NaryEltwise::OCV/CPU 85.160 84.625 1.01
NCHW_NCHW_sub::Layer_NaryEltwise::OCV/CPU 86.368 79.089 1.09
NCHW_NCHW_sum::Layer_NaryEltwise::OCV/CPU 89.897 78.993 1.14
NHWC_C::Layer_NaryEltwise::OCV/CPU 77.220 71.425 1.08
NHWC_H::Layer_NaryEltwise::OCV/CPU 67.494 42.832 1.58
My results for Intel(R) Core(TM) i5-2500K CPU @ 3.30GHz (no AVX2):
NCHW_C_sum::Layer_NaryEltwise::OCV/CPU 24.193 17.846 1.36
NCHW_NCHW_add::Layer_NaryEltwise::OCV/CPU 24.026 23.313 1.03
NCHW_NCHW_div::Layer_NaryEltwise::OCV/CPU 27.370 23.279 1.18
NCHW_NCHW_equal::Layer_NaryEltwise::OCV/CPU 35.025 23.254 1.51
NCHW_NCHW_greater::Layer_NaryEltwise::OCV/CPU 32.455 23.260 1.40
NCHW_NCHW_less::Layer_NaryEltwise::OCV/CPU 32.509 23.321 1.39
NCHW_NCHW_max::Layer_NaryEltwise::OCV/CPU 23.997 23.262 1.03
NCHW_NCHW_mean::Layer_NaryEltwise::OCV/CPU 24.038 23.270 1.03
NCHW_NCHW_min::Layer_NaryEltwise::OCV/CPU 23.977 23.269 1.03
NCHW_NCHW_mul::Layer_NaryEltwise::OCV/CPU 23.927 23.279 1.03
NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU 320.598 98.029 3.27
NCHW_NCHW_ref_div::Layer_NaryEltwise::OCV/CPU 24.507 24.488 1.00
NCHW_NCHW_ref_max::Layer_NaryEltwise::OCV/CPU 24.484 24.477 1.00
NCHW_NCHW_ref_min::Layer_NaryEltwise::OCV/CPU 24.500 24.471 1.00
NCHW_NCHW_ref_mul::Layer_NaryEltwise::OCV/CPU 24.486 24.482 1.00
NCHW_NCHW_ref_sum::Layer_NaryEltwise::OCV/CPU 24.472 24.476 1.00
NCHW_NCHW_sub::Layer_NaryEltwise::OCV/CPU 23.953 23.281 1.03
NCHW_NCHW_sum::Layer_NaryEltwise::OCV/CPU 23.992 23.274 1.03
NHWC_C::Layer_NaryEltwise::OCV/CPU 18.260 18.489 0.99
NHWC_H::Layer_NaryEltwise::OCV/CPU 24.182 17.829 1.36
Thank you @asmorkalov for adding more performance results :)
Any review comments?
The patch leads to significant OpenCL pipelines degradation, e.g.:
VIT_B_32::DNNTestNetwork::OCV/CPU 149.576 191.409 0.78
VIT_B_32::DNNTestNetwork::OCV/OCL 104.428 445.013 0.23
VIT_B_32::DNNTestNetwork::OCV/OCL_FP16 102.505 442.994 0.23
I use NVIDIA GF 1080 for benchmark. Looks like the patch prevents some graph fusing or some inference optimization. Looking into details, if it really caused by the PR.
The patch leads to significant OpenCL pipelines degradation, e.g.:
VIT_B_32::DNNTestNetwork::OCV/CPU 149.576 191.409 0.78 VIT_B_32::DNNTestNetwork::OCV/OCL 104.428 445.013 0.23 VIT_B_32::DNNTestNetwork::OCV/OCL_FP16 102.505 442.994 0.23
I use NVIDIA GF 1080 for benchmark. Looks like the patch prevents some graph fusing or some inference optimization. Looking into details, if it really caused by the PR.
Ok, I will take a look at the problem.
@asmorkalov The performance "degradation" is due to very out-of-date code base (>450 commits behind 4.x). I have updated the code base. Performance testings (on Intel UHD 770) seem to be okay on my side. Feel free to retest on your side.
Thinking positively, we have achieved a lot performance boosting from those commits (OCL is ~4x faster and CPU is ~1.3x faster). Maybe I can add the OCL backend for this layer later :)
perf-dnn.zip OpenCL related degradation disappeared. Perf numbers for updated PR for core i5-2500:
NCHW_C_sum::Layer_NaryEltwise::OCV/CPU 24.142 17.999 1.34
NCHW_NCHW_add::Layer_NaryEltwise::OCV/CPU 23.860 23.265 1.03
NCHW_NCHW_div::Layer_NaryEltwise::OCV/CPU 27.383 23.282 1.18
NCHW_NCHW_equal::Layer_NaryEltwise::OCV/CPU 39.056 23.292 1.68
NCHW_NCHW_greater::Layer_NaryEltwise::OCV/CPU 32.489 23.290 1.39
NCHW_NCHW_less::Layer_NaryEltwise::OCV/CPU 32.435 23.257 1.39
NCHW_NCHW_max::Layer_NaryEltwise::OCV/CPU 23.966 23.269 1.03
NCHW_NCHW_mean::Layer_NaryEltwise::OCV/CPU 23.992 23.276 1.03
NCHW_NCHW_min::Layer_NaryEltwise::OCV/CPU 23.951 23.273 1.03
NCHW_NCHW_mul::Layer_NaryEltwise::OCV/CPU 23.862 23.272 1.03
NCHW_NCHW_pow::Layer_NaryEltwise::OCV/CPU 320.265 97.879 3.27
NCHW_NCHW_ref_div::Layer_NaryEltwise::OCV/CPU 24.491 24.487 1.00
NCHW_NCHW_ref_max::Layer_NaryEltwise::OCV/CPU 24.463 24.464 1.00
NCHW_NCHW_ref_min::Layer_NaryEltwise::OCV/CPU 24.472 24.465 1.00
NCHW_NCHW_ref_mul::Layer_NaryEltwise::OCV/CPU 24.460 24.453 1.00
NCHW_NCHW_ref_sum::Layer_NaryEltwise::OCV/CPU 24.463 24.530 1.00
NCHW_NCHW_sub::Layer_NaryEltwise::OCV/CPU 23.870 23.271 1.03
NCHW_NCHW_sum::Layer_NaryEltwise::OCV/CPU 23.964 23.764 1.01
NHWC_C::Layer_NaryEltwise::OCV/CPU 18.083 18.458 0.98
NHWC_H::Layer_NaryEltwise::OCV/CPU 24.140 17.857 1.35
This PR introduces the following changes:
Performance
i7-12700K, RAM 64GB, Ubuntu 22.04
Apple M1, RAM 16GB, macOS 14.4.1
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request