Open alex77g2 opened 1 year ago
@alex77g2 Please take a look on G-API.
@alex77g2 Please take a look on G-API.
Thanks. But 2 points I'm still not sure:
I understand, that not all CV-algo-combinations can be hand-optimized. But Sobel-pairs is the start of a huge family of CV-algorithms. So it should be concidered separately.
Please consider these points again.
cc @TolyaTalamanov @dmatveev
Principal idea is to go from image pipelining (finish each operator, before next one starts) to row-pipelining (each row enters L1 only once). General benefits: much faster (with typical L1 cache sizes) + less lines of code + less temporal memory. Edge-detection is a major method in CV in general, and Sobel is taken here as example.
This is exactly what G-API does with the Fluid backend.
Does G-API-piplining also mean, that temporal memory can be saved? (meaning: if only the final result-image is needed, no megabytes for the in-between steps is reserved/wasted (because it is done within single lines).
Exactly. With the Fluid backend, the intermediate memory is optimized out.
From my understanding, an hand-optimized version of an many-in-one method, should still improve much. Especially RGB-image to combined edge-magnitude. Auto-pipelining should only solve the L1-cache limit. Do you belive that G-API will reach the same level of possible performance?
There's SobelXY in G-API: https://github.com/opencv/opencv/blob/4.x/modules/gapi/src/backends/fluid/gfluidimgproc.cpp#L1061
It is currently limited to 3x3 window size but the engine supports any window size and contributions are welcome!
Does G-API-piplining also mean, that temporal memory can be saved? (meaning: if only the final result-image is needed, no megabytes for the in-between steps is reserved/wasted (because it is done within single lines).
Exactly. With the Fluid backend, the intermediate memory is optimized out.
From my understanding, an hand-optimized version of an many-in-one method, should still improve much. Especially RGB-image to combined edge-magnitude. Auto-pipelining should only solve the L1-cache limit. Do you belive that G-API will reach the same level of possible performance?
There's SobelXY in G-API: https://github.com/opencv/opencv/blob/4.x/modules/gapi/src/backends/fluid/gfluidimgproc.cpp#L1061
It is currently limited to 3x3 window size but the engine supports any window size and contributions are welcome!
Thanks for details about G-API. There is a mathematical problem with Sobel 5x5, which I think is the reason for seldom usage. Sobel 5x5 writen as separated product, consists of a blur vector in one direction and a difference in the orthogonal direction. The Blur usually is a rounded gaussian, e.g. [1,2,4,2,1], but also used [1,1,1,1,1] or [1,1,2,1,1] or others. Fine until here. But in gradient direction I'm aware of different approaches: [+2,+1, 0,-1,-2] and [+1,+2, 0,-2,-1]-like delta-vectors. In most cases Blur(3x3) before Sobel(3x3) give a reasonable and symmetric result.
Describe the feature and motivation
Usually/often both Sobel X and Y are needed together (considered usecase here). This applies to all Arch (e.g. ARM, x86, x64, OpenCL, CUDA, ..), if ImageSize > L1-Cache. Principal idea is to go from image pipelining (finish each operator, before next one starts) to row-pipelining (each row enters L1 only once). General benefits: much faster (with typical L1 cache sizes) + less lines of code + less temporal memory. Edge-detection is a major method in CV in general, and Sobel is taken here as example.
Additional context
The Issue applies to both Python and C-Interface This Interface Code should only explain the idea, it is not intended to be final.
1. the grayscale case (only 1 channel) -- [RGB/BGR see below]
2. grayscale cases, but combinations of Sobel X+Y
3. Color images, RGB/BGR (3 channel)
Thanks for reading.