opencv / opencv_contrib

Repository for OpenCV's extra modules
Apache License 2.0
9.19k stars 5.74k forks source link

supports empty kernels in cuda::SeparableLinearFilters #3731

Open chacha21 opened 2 months ago

chacha21 commented 2 months ago

#25408

When only 1D convolution is needed (row or column filter only), cuda::LinearFilter might be slower than cuda::SeparableLinearFilter Using cuda::SeparableLinearFilter for 1D convolution can be done by using a (1) kernel for the ignored dimension. By supporting empty kernels in cuda::SeparableLinearFilter, there is no need for that (1) kernel any more. Additionaly, the inner _buf used to store the intermediate convolution result can be saved when a single convolution is needed.

In "legacy" usage (row+col kernels), there is no regression in cuda::SeparableLinearFilter performance. As soon as an empty kernel is used, the performance is largely increased.

Devil in the details : the "in-place" processing is supported and might need intermediate buf, but still no regression.

chacha21 commented 2 months ago
//size (2048x1024), 100x iterations
============================================
isInPlace:0     useRowKernel:0  useColKernel:0  srcType:CV_8U   dstType:CV_8U   (needsBuf : 0)
sepOrg:529.13 ms
sepNew:64.8515 ms
============================================
isInPlace:0     useRowKernel:0  useColKernel:0  srcType:CV_8U   dstType:CV_16U  (needsBuf : 0)
sepOrg:529.524 ms
sepNew:95.3996 ms
============================================
isInPlace:0     useRowKernel:0  useColKernel:0  srcType:CV_8U   dstType:CV_32F  (needsBuf : 0)
sepOrg:543.191 ms
sepNew:156.277 ms
============================================
isInPlace:0     useRowKernel:0  useColKernel:0  srcType:CV_16U  dstType:CV_8U   (needsBuf : 0)
sepOrg:541.52 ms
sepNew:101.711 ms
============================================
isInPlace:0     useRowKernel:0  useColKernel:0  srcType:CV_16U  dstType:CV_16U  (needsBuf : 0)
sepOrg:542.559 ms
sepNew:125.624 ms
============================================
isInPlace:0     useRowKernel:0  useColKernel:0  srcType:CV_16U  dstType:CV_32F  (needsBuf : 0)
sepOrg:555.81 ms
sepNew:185.418 ms
============================================
isInPlace:0     useRowKernel:0  useColKernel:0  srcType:CV_32F  dstType:CV_8U   (needsBuf : 0)
sepOrg:556.713 ms
sepNew:147.636 ms
============================================
isInPlace:0     useRowKernel:0  useColKernel:0  srcType:CV_32F  dstType:CV_16U  (needsBuf : 0)
sepOrg:557.543 ms
sepNew:179.893 ms
============================================
isInPlace:0     useRowKernel:0  useColKernel:0  srcType:CV_32F  dstType:CV_32F  (needsBuf : 0)
sepOrg:571.227 ms
sepNew:246.701 ms
============================================
isInPlace:0     useRowKernel:0  useColKernel:1  srcType:CV_8U   dstType:CV_8U   (needsBuf : 1)
sepOrg:540.822 ms
sepNew:457.014 ms
============================================
isInPlace:0     useRowKernel:0  useColKernel:1  srcType:CV_8U   dstType:CV_16U  (needsBuf : 1)
sepOrg:544.813 ms
sepNew:463.19 ms
============================================
isInPlace:0     useRowKernel:0  useColKernel:1  srcType:CV_8U   dstType:CV_32F  (needsBuf : 1)
sepOrg:554.78 ms
sepNew:476.576 ms
============================================
isInPlace:0     useRowKernel:0  useColKernel:1  srcType:CV_16U  dstType:CV_8U   (needsBuf : 1)
sepOrg:552.229 ms
sepNew:484.566 ms
============================================
isInPlace:0     useRowKernel:0  useColKernel:1  srcType:CV_16U  dstType:CV_16U  (needsBuf : 1)
sepOrg:559.432 ms
sepNew:492.91 ms
============================================
isInPlace:0     useRowKernel:0  useColKernel:1  srcType:CV_16U  dstType:CV_32F  (needsBuf : 1)
sepOrg:567.785 ms
sepNew:506.038 ms
============================================
isInPlace:0     useRowKernel:0  useColKernel:1  srcType:CV_32F  dstType:CV_8U   (needsBuf : 0)
sepOrg:568.706 ms
sepNew:307.84 ms
============================================
isInPlace:0     useRowKernel:0  useColKernel:1  srcType:CV_32F  dstType:CV_16U  (needsBuf : 0)
sepOrg:573.123 ms
sepNew:312.824 ms
============================================
isInPlace:0     useRowKernel:0  useColKernel:1  srcType:CV_32F  dstType:CV_32F  (needsBuf : 0)
sepOrg:582.609 ms
sepNew:323.791 ms
============================================
isInPlace:0     useRowKernel:1  useColKernel:0  srcType:CV_8U   dstType:CV_8U   (needsBuf : 1)
sepOrg:551.797 ms
sepNew:403.631 ms
============================================
isInPlace:0     useRowKernel:1  useColKernel:0  srcType:CV_8U   dstType:CV_16U  (needsBuf : 1)
sepOrg:552.006 ms
sepNew:435.532 ms
============================================
isInPlace:0     useRowKernel:1  useColKernel:0  srcType:CV_8U   dstType:CV_32F  (needsBuf : 0)
sepOrg:566.179 ms
sepNew:254.639 ms
============================================
isInPlace:0     useRowKernel:1  useColKernel:0  srcType:CV_16U  dstType:CV_8U   (needsBuf : 1)
sepOrg:563.438 ms
sepNew:414.402 ms
============================================
isInPlace:0     useRowKernel:1  useColKernel:0  srcType:CV_16U  dstType:CV_16U  (needsBuf : 1)
sepOrg:565.194 ms
sepNew:446.6 ms
============================================
isInPlace:0     useRowKernel:1  useColKernel:0  srcType:CV_16U  dstType:CV_32F  (needsBuf : 0)
sepOrg:578.71 ms
sepNew:265.824 ms
============================================
isInPlace:0     useRowKernel:1  useColKernel:0  srcType:CV_32F  dstType:CV_8U   (needsBuf : 1)
sepOrg:584.837 ms
sepNew:433.561 ms
============================================
isInPlace:0     useRowKernel:1  useColKernel:0  srcType:CV_32F  dstType:CV_16U  (needsBuf : 1)
sepOrg:585.415 ms
sepNew:465.748 ms
============================================
isInPlace:0     useRowKernel:1  useColKernel:0  srcType:CV_32F  dstType:CV_32F  (needsBuf : 0)
sepOrg:598.477 ms
sepNew:284.285 ms
============================================
isInPlace:0     useRowKernel:1  useColKernel:1  srcType:CV_8U   dstType:CV_8U   (needsBuf : 1)
sepOrg:563.67 ms
sepNew:563.476 ms
============================================
isInPlace:0     useRowKernel:1  useColKernel:1  srcType:CV_8U   dstType:CV_16U  (needsBuf : 1)
sepOrg:568.284 ms!!!
sepNew:568.588 ms!!!
============================================
isInPlace:0     useRowKernel:1  useColKernel:1  srcType:CV_8U   dstType:CV_32F  (needsBuf : 1)
sepOrg:577.79 ms!!!
sepNew:577.947 ms!!!
============================================
isInPlace:0     useRowKernel:1  useColKernel:1  srcType:CV_16U  dstType:CV_8U   (needsBuf : 1)
sepOrg:575.168 ms
sepNew:574.481 ms
============================================
isInPlace:0     useRowKernel:1  useColKernel:1  srcType:CV_16U  dstType:CV_16U  (needsBuf : 1)
sepOrg:581.33 ms
sepNew:579.709 ms
============================================
isInPlace:0     useRowKernel:1  useColKernel:1  srcType:CV_16U  dstType:CV_32F  (needsBuf : 1)
sepOrg:591.171 ms
sepNew:589.734 ms
============================================
isInPlace:0     useRowKernel:1  useColKernel:1  srcType:CV_32F  dstType:CV_8U   (needsBuf : 1)
sepOrg:597.099 ms
sepNew:595.435 ms
============================================
isInPlace:0     useRowKernel:1  useColKernel:1  srcType:CV_32F  dstType:CV_16U  (needsBuf : 1)
sepOrg:601.671 ms
sepNew:599.812 ms
============================================
isInPlace:0     useRowKernel:1  useColKernel:1  srcType:CV_32F  dstType:CV_32F  (needsBuf : 1)
sepOrg:611.205 ms
sepNew:609.741 ms
============================================
isInPlace:1     useRowKernel:0  useColKernel:0  srcType:CV_8U   dstType:CV_8U   (needsBuf : 0)
sepOrg:527.647 ms
sepNew:0.114411 ms
============================================
isInPlace:1     useRowKernel:0  useColKernel:0  srcType:CV_8U   dstType:CV_16U  (needsBuf : 0)
sepOrg:529.202 ms
sepNew:95.1992 ms
============================================
isInPlace:1     useRowKernel:0  useColKernel:0  srcType:CV_8U   dstType:CV_32F  (needsBuf : 0)
sepOrg:542.78 ms
sepNew:156.088 ms
============================================
isInPlace:1     useRowKernel:0  useColKernel:0  srcType:CV_16U  dstType:CV_8U   (needsBuf : 0)
sepOrg:541.307 ms
sepNew:101.901 ms
============================================
isInPlace:1     useRowKernel:0  useColKernel:0  srcType:CV_16U  dstType:CV_16U  (needsBuf : 0)
sepOrg:541.762 ms
sepNew:0.109288 ms
============================================
isInPlace:1     useRowKernel:0  useColKernel:0  srcType:CV_16U  dstType:CV_32F  (needsBuf : 0)
sepOrg:555.668 ms
sepNew:184.811 ms
============================================
isInPlace:1     useRowKernel:0  useColKernel:0  srcType:CV_32F  dstType:CV_8U   (needsBuf : 0)
sepOrg:556.58 ms
sepNew:147.596 ms
============================================
isInPlace:1     useRowKernel:0  useColKernel:0  srcType:CV_32F  dstType:CV_16U  (needsBuf : 0)
sepOrg:557.355 ms
sepNew:179.582 ms
============================================
isInPlace:1     useRowKernel:0  useColKernel:0  srcType:CV_32F  dstType:CV_32F  (needsBuf : 0)
sepOrg:570.107 ms
sepNew:0.11555 ms
============================================
isInPlace:1     useRowKernel:0  useColKernel:1  srcType:CV_8U   dstType:CV_8U   (needsBuf : 1)
sepOrg:539.966 ms
sepNew:456.471 ms
============================================
isInPlace:1     useRowKernel:0  useColKernel:1  srcType:CV_8U   dstType:CV_16U  (needsBuf : 1)
sepOrg:544.997 ms
sepNew:463.23 ms
============================================
isInPlace:1     useRowKernel:0  useColKernel:1  srcType:CV_8U   dstType:CV_32F  (needsBuf : 1)
sepOrg:554.855 ms
sepNew:476.561 ms
============================================
isInPlace:1     useRowKernel:0  useColKernel:1  srcType:CV_16U  dstType:CV_8U   (needsBuf : 1)
sepOrg:552.478 ms
sepNew:484.727 ms
============================================
isInPlace:1     useRowKernel:0  useColKernel:1  srcType:CV_16U  dstType:CV_16U  (needsBuf : 1)
sepOrg:557.832 ms
sepNew:490.721 ms
============================================
isInPlace:1     useRowKernel:0  useColKernel:1  srcType:CV_16U  dstType:CV_32F  (needsBuf : 1)
sepOrg:568.171 ms
sepNew:505.864 ms
============================================
isInPlace:1     useRowKernel:0  useColKernel:1  srcType:CV_32F  dstType:CV_8U   (needsBuf : 1)
sepOrg:568.683 ms
sepNew:307.768 ms
============================================
isInPlace:1     useRowKernel:0  useColKernel:1  srcType:CV_32F  dstType:CV_16U  (needsBuf : 1)
sepOrg:573.375 ms
sepNew:313.126 ms
============================================
isInPlace:1     useRowKernel:0  useColKernel:1  srcType:CV_32F  dstType:CV_32F  (needsBuf : 1)
sepOrg:582.594 ms
sepNew:568.17 ms
============================================
isInPlace:1     useRowKernel:1  useColKernel:0  srcType:CV_8U   dstType:CV_8U   (needsBuf : 1)
sepOrg:551.044 ms
sepNew:402.997 ms
============================================
isInPlace:1     useRowKernel:1  useColKernel:0  srcType:CV_8U   dstType:CV_16U  (needsBuf : 1)
sepOrg:552.27 ms
sepNew:435.557 ms
============================================
isInPlace:1     useRowKernel:1  useColKernel:0  srcType:CV_8U   dstType:CV_32F  (needsBuf : 1)
sepOrg:566.319 ms
sepNew:254.686 ms
============================================
isInPlace:1     useRowKernel:1  useColKernel:0  srcType:CV_16U  dstType:CV_8U   (needsBuf : 1)
sepOrg:563.839 ms
sepNew:414.404 ms
============================================
isInPlace:1     useRowKernel:1  useColKernel:0  srcType:CV_16U  dstType:CV_16U  (needsBuf : 1)
sepOrg:564.707 ms
sepNew:446.051 ms
============================================
isInPlace:1     useRowKernel:1  useColKernel:0  srcType:CV_16U  dstType:CV_32F  (needsBuf : 1)
sepOrg:578.793 ms
sepNew:265.881 ms
============================================
isInPlace:1     useRowKernel:1  useColKernel:0  srcType:CV_32F  dstType:CV_8U   (needsBuf : 1)
sepOrg:585.297 ms
sepNew:433.199 ms
============================================
isInPlace:1     useRowKernel:1  useColKernel:0  srcType:CV_32F  dstType:CV_16U  (needsBuf : 1)
sepOrg:585.63 ms
sepNew:465.641 ms
============================================
isInPlace:1     useRowKernel:1  useColKernel:0  srcType:CV_32F  dstType:CV_32F  (needsBuf : 1)
sepOrg:598.339 ms
sepNew:531.541 ms
============================================
isInPlace:1     useRowKernel:1  useColKernel:1  srcType:CV_8U   dstType:CV_8U   (needsBuf : 1)
sepOrg:562.944 ms
sepNew:562.755 ms
============================================
isInPlace:1     useRowKernel:1  useColKernel:1  srcType:CV_8U   dstType:CV_16U  (needsBuf : 1)
sepOrg:568.359 ms!!!
sepNew:568.809 ms!!!
============================================
isInPlace:1     useRowKernel:1  useColKernel:1  srcType:CV_8U   dstType:CV_32F  (needsBuf : 1)
sepOrg:578.109 ms!!!
sepNew:578.299 ms!!!
============================================
isInPlace:1     useRowKernel:1  useColKernel:1  srcType:CV_16U  dstType:CV_8U   (needsBuf : 1)
sepOrg:575.254 ms
sepNew:574.528 ms
============================================
isInPlace:1     useRowKernel:1  useColKernel:1  srcType:CV_16U  dstType:CV_16U  (needsBuf : 1)
sepOrg:580.567 ms
sepNew:579.043 ms
============================================
isInPlace:1     useRowKernel:1  useColKernel:1  srcType:CV_16U  dstType:CV_32F  (needsBuf : 1)
sepOrg:591.259 ms
sepNew:589.886 ms
============================================
isInPlace:1     useRowKernel:1  useColKernel:1  srcType:CV_32F  dstType:CV_8U   (needsBuf : 1)
sepOrg:597.409 ms
sepNew:595.706 ms
============================================
isInPlace:1     useRowKernel:1  useColKernel:1  srcType:CV_32F  dstType:CV_16U  (needsBuf : 1)
sepOrg:601.568 ms
sepNew:599.995 ms
============================================
isInPlace:1     useRowKernel:1  useColKernel:1  srcType:CV_32F  dstType:CV_32F  (needsBuf : 1)
sepOrg:610.982 ms
sepNew:609.207 ms