SSE优化版的soble和gaussblur能替换工程里面的soble和gaussblur进行加速吗？

xinsuinizhuan commented 4 years ago

我找到了一些sse优化的soble和gaussblur算法，能替换里面的soble和gaussblur来进行加速不？ alg_speed.txt 我尝试了替换了下，灰度图会崩溃，彩色图只替换了gaussblur，就什么都检测不到了。

static void quantizedOrientations(const Mat &src, Mat &magnitude, Mat &angle, float threshold) { //Mat smoothed; ////Compute horizontaland vertical image derivatives on all color channels separately //static const int KERNEL_SIZE = 7; ////For some reason cvSmooth / cv::GaussianBlur, cvSobel / cv::Sobel have different defaults for border handling... //GaussianBlur(src, smoothed, Size(KERNEL_SIZE, KERNEL_SIZE), 0, 0, BORDER_REPLICATE);

int Height = src.rows;
int Width = src.cols;
unsigned char* Src = src.data;
unsigned char* Dest = new unsigned char[Height * Width * 3];
int Stride = Width * 3;
static const int KERNEL_SIZE = 7;
IM_GaussBlur_SSE(Src, Dest, Width, Height, Stride, KERNEL_SIZE);
Mat smoothed(Height, Width, CV_8UC3, Dest);

if(src.channels() == 1){
    Mat sobel_dx, sobel_dy, sobel_ag;
    Sobel(smoothed, sobel_dx, CV_32F, 1, 0, 3, 1.0, 0.0, BORDER_REPLICATE);
    Sobel(smoothed, sobel_dy, CV_32F, 0, 1, 3, 1.0, 0.0, BORDER_REPLICATE);
    magnitude = sobel_dx.mul(sobel_dx) + sobel_dy.mul(sobel_dy);
    phase(sobel_dx, sobel_dy, sobel_ag, true);
    hysteresisGradient(magnitude, angle, sobel_ag, threshold * threshold);

}else{

    magnitude.create(src.size(), CV_32F);

    // Allocate temporary buffers
    Size size = src.size();
    Mat sobel_3dx;              // per-channel horizontal derivative
    Mat sobel_3dy;              // per-channel vertical derivative
    Mat sobel_dx(size, CV_32F); // maximum horizontal derivative
    Mat sobel_dy(size, CV_32F); // maximum vertical derivative
    Mat sobel_ag;               // final gradient orientation (unquantized)

    Sobel(smoothed, sobel_3dx, CV_16S, 1, 0, 3, 1.0, 0.0, BORDER_REPLICATE);
    Sobel(smoothed, sobel_3dy, CV_16S, 0, 1, 3, 1.0, 0.0, BORDER_REPLICATE);

    short *ptrx = (short *)sobel_3dx.data;
    short *ptry = (short *)sobel_3dy.data;
    float *ptr0x = (float *)sobel_dx.data;
    float *ptr0y = (float *)sobel_dy.data;
    float *ptrmg = (float *)magnitude.data;

    const int length1 = static_cast<const int>(sobel_3dx.step1());
    const int length2 = static_cast<const int>(sobel_3dy.step1());
    const int length3 = static_cast<const int>(sobel_dx.step1());
    const int length4 = static_cast<const int>(sobel_dy.step1());
    const int length5 = static_cast<const int>(magnitude.step1());
    const int length0 = sobel_3dy.cols * 3;

    for (int r = 0; r < sobel_3dy.rows; ++r)
    {
        int ind = 0;

        for (int i = 0; i < length0; i += 3)
        {
            // Use the gradient orientation of the channel whose magnitude is largest
            int mag1 = ptrx[i + 0] * ptrx[i + 0] + ptry[i + 0] * ptry[i + 0];
            int mag2 = ptrx[i + 1] * ptrx[i + 1] + ptry[i + 1] * ptry[i + 1];
            int mag3 = ptrx[i + 2] * ptrx[i + 2] + ptry[i + 2] * ptry[i + 2];

            if (mag1 >= mag2 && mag1 >= mag3)
            {
                ptr0x[ind] = ptrx[i];
                ptr0y[ind] = ptry[i];
                ptrmg[ind] = (float)mag1;
            }
            else if (mag2 >= mag1 && mag2 >= mag3)
            {
                ptr0x[ind] = ptrx[i + 1];
                ptr0y[ind] = ptry[i + 1];
                ptrmg[ind] = (float)mag2;
            }
            else
            {
                ptr0x[ind] = ptrx[i + 2];
                ptr0y[ind] = ptry[i + 2];
                ptrmg[ind] = (float)mag3;
            }
            ++ind;
        }
        ptrx += length1;
        ptry += length2;
        ptr0x += length3;
        ptr0y += length4;
        ptrmg += length5;
    }

    // Calculate the final gradient orientations
    phase(sobel_dx, sobel_dy, sobel_ag, true);
    hysteresisGradient(magnitude, angle, sobel_ag, threshold * threshold);
}

}

meiqua commented 4 years ago

理论上换别的应该可以。灰度图，前面声明不应该直接三通道；new了不管了可能内存泄露；彩色图可能数据排布对不上，检查下多通道的排布是不是一样的

xinsuinizhuan commented 4 years ago

理论上换别的应该可以。灰度图，前面声明不应该直接三通道；new了不管了可能内存泄露；彩色图可能数据排布对不上，检查下多通道的排布是不是一样的

这两个opencv的soble，怎么用SSE版的soble替换?

meiqua commented 4 years ago

输入输出对得上直接换就好了。opencv本身也用SSE加速过，不知道你说的SSE版本快了多少？

xinsuinizhuan commented 4 years ago

输入输出对得上直接换就好了。opencv本身也用SSE加速过，不知道你说的SSE版本快了多少？

这个进行了两次soble是什么意思？都差不多一个数据量级的加速 soble SSE 加速看这里：https://www.cnblogs.com/Imageshop/p/7285564.html gaussblur SSE加速看这里：https://www.cnblogs.com/Imageshop/p/6376028.html

xinsuinizhuan commented 4 years ago

输入输出对得上直接换就好了。opencv本身也用SSE加速过，不知道你说的SSE版本快了多少？

这个进行了两次soble是什么意思？都差不多一个数据量级的加速 soble SSE 加速看这里：https://www.cnblogs.com/Imageshop/p/7285564.html gaussblur SSE加速看这里：https://www.cnblogs.com/Imageshop/p/6376028.html

但是替换了gaussblur后，什么都检测不到了。这是我整理好的soble和gaussblur算法 alg_speed.txt 试试看，看到底怎么回事，到底能不能替换。

meiqua commented 4 years ago

大概看了下，这个是跟最原始的实现比，跟opencv比不一定有很大的加速，因为opencv也有SSE，还用到separatable filter。不用fusion想加速还是并行比较好。至于能不能替换，可以先用一个简单的图像处理看看，算出来的样子一样才对

meiqua / shape_based_matching

SSE优化版的soble和gaussblur能替换工程里面的soble和gaussblur进行加速吗？ #63