meiqua / shape_based_matching

try to implement halcon shape based matching, refer to machine vision algorithms and applications, page 317 3.11.5, written by halcon engineers
BSD 2-Clause "Simplified" License
1.25k stars 484 forks source link

How to speed up the creation of responsemap? #21

Open Dyson-Ido opened 5 years ago

Dyson-Ido commented 5 years ago

Hi Meiqua, 测试中发现整个时间,大部分用于了创建responsemap, 请问这个过程可以有办法加速么?例如我有张130万像素的图片,用了150ms创建的,有办法加快么?谢谢!

meiqua commented 5 years ago

有,还可以加速一个量级。 目前的速度瓶颈在opencv的filter上,我之前研究过,opencv用到的加速技巧有separeble filter, SIMD。还可以用的技巧有parallelism、kernel fusion(sobel phase pyrdown大部分都能fuse起来)。 其实这个要写的可读性好、速度最快还挺难的。不过有一帮大神有专门为这个设计了个DSL(domain specified language) halide talk, halide github,非常不错,不直接用也可以参考下思路。

DennisLiu-elogic commented 4 years ago

有,还可以加速一个量级。 目前的速度瓶颈在opencv的filter上,我之前研究过,opencv用到的加速技巧有separeble filter, SIMD。还可以用的技巧有parallelism、kernel fusion(sobel phase pyrdown大部分都能fuse起来)。 其实这个要写的可读性好、速度最快还挺难的。不过有一帮大神有专门为这个设计了个DSL(domain specified language) halide talk, halide github,非常不错,不直接用也可以参考下思路。

想請問meiqua大對於OpenCV4.0 GAPI有什麼看法? 這是官方的解說 **G-API is a separate OpenCV module so its header files have to be included explicitly. The first four lines of main() create and initialize OpenCV's standard video capture object, which fetches video frames from either an attached camera or a specified file.

G-API pipeline is constructed next. In fact, it is a series of G-API operation calls on cv::GMat data. The important aspect of G-API is that this code block is just a declaration of actions, but not the actions themselves. No processing happens at this point, G-API only tracks which operations form pipeline and how it is connected. G-API Data objects (here it is cv::GMat) are used to connect operations each other. in is an empty cv::GMat signalling that it is a beginning of computation.**

我覺得其中的概念跟Halide很像 寫了下測試程式: int main() { Mat matSrc = imread ("C:\Users\User\source\repos\ShapeBasedMatching\ShapeBasedMatching\test\MyCase0\cross.bmp"); Mat matSmoothed, matSobelX, matSobelY, matMag, matAg; GMat gmatSrc; GMat gmatSmoothed = cv::gapi::gaussianBlur (gmatSrc, Size (7, 7), 0, 0, BORDER_REPLICATE); GMat gmatSobelX = cv::gapi::Sobel (gmatSmoothed, CV_32F, 1, 0, 3, 1.0, 0.0, BORDER_REPLICATE); GMat gmatSobelY = cv::gapi::Sobel (gmatSmoothed, CV_32F, 0, 1, 3, 1.0, 0.0, BORDER_REPLICATE); GMat gmatMag = cv::gapi::add (cv::gapi::mul (gmatSobelX, gmatSobelX), cv::gapi::mul (gmatSobelY, gmatSobelY)); //GMat gmatAg = cv::gapi::phase (gmatSobelX, gmatSobelY, true); GComputation gcomputeMag (gmatSrc, gmatMag); //GComputation gcomputeAg (gmatSrc, gmatAg);

for (int i = 0 ; i < 4 ; i++)
{

    double d1 = clock ();

    GaussianBlur (matSrc, matSmoothed, Size (7, 7), 0, 0, BORDER_REPLICATE);
    Sobel (matSmoothed, matSobelX, CV_32F, 1, 0, 3, 1.0, 0.0, BORDER_REPLICATE);
    Sobel (matSmoothed, matSobelY, CV_32F, 0, 1, 3, 1.0, 0.0, BORDER_REPLICATE);
    add (matSobelX.mul (matSobelX), matSobelY.mul (matSobelY), matMag);
    phase (matSobelX, matSobelY, matAg, true);
    double d2 = clock ();

    gcomputeMag.apply (matSrc, matMag);
    //gcomputeAg.apply (matSrc, matAg);
    double d3 = clock ();

    cout << d2 - d1 << "ms, " << d3 - d2 << "ms" << endl;
}

return 0;

} 只做Magnitude,速度快了2倍左右 image

meiqua commented 4 years ago

@DennisLiu-elogic 很赞,相当于简化的halide,没有自己安排的接口但用在一般场合完全够了

meiqua commented 4 years ago

最近简单实现了整个fusion的过程,大家有兴趣可以跑跑看。

aemior commented 4 years ago

最近尝试用Halide加速responsemap构造的过程,目前仅仅测试了 quantizedOrientations 函数能不能加速,可是发现一个不理解的现象,就是对于一张 2048x2048的RGB图像, 如果放到整个匹配的过程里面跑,测时间,量化梯度方向 quantizedOrientations 仅仅需要 120ms 左右。但是如果将 quantizedOrientations 单独拿出来跑却需要 450ms。所谓单独跑是这样的,因为要对比 自己写的 Halide 加速的 quantizedOrientations 函数,所以将quantizedOrientations 连带hysteresisGradient 直接从line2Dup.cpp 里复制出来,编译成一个可执行文件 去跑一张图像做对比。请问有谁知道,出现这种情况有可能是什么原因,完全摸不着头脑。opencv 版本是 3.4.5 gcc版本7.5,不知道会不会是编译的问题。

aemior commented 4 years ago

一直埋头踩坑,才发现@meiqua大大已经手工做好了fusion版本,太高效了

meiqua commented 4 years ago

@aemior 优化打开了吗?
哈哈,手工fusion也是我踩坑踩过来的。如果Halide配好了可以比较一下看看。

aemior commented 4 years ago

@meiqua 刚按照Halide 的语法写好piplline,只是到量化方向,还没spread,优化只是做了一些parallels和vectors,自动优化也还没尝试, 目前就是卡在前面说的单独测试,Halide写的能把450ms的流程加速到160ms,但是放到整个匹配的流程里面,速度竟然没有opencv的快,有点懵圈,还在检查中

aemior commented 4 years ago

另外Halide感觉没有想象中的好用,主要是实现图像操作的语法有些不一样,安装也配LLVM也不容易,踩坑踩得吐血

meiqua commented 4 years ago

@aemior 哈哈,加油

tingcao-njust commented 4 years ago

我覺得其中的概念跟Halide很像 寫了下測試程式: int main() { Mat matSrc = imread ("C:\Users\User\source\repos\ShapeBasedMatching\ShapeBasedMatching\test\MyCase0\cross.bmp"); Mat matSmoothed, matSobelX, matSobelY, matMag, matAg; GMat gmatSrc; GMat gmatSmoothed = cv::gapi::gaussianBlur (gmatSrc, Size (7, 7), 0, 0, BORDER_REPLICATE); GMat gmatSobelX = cv::gapi::Sobel (gmatSmoothed, CV_32F, 1, 0, 3, 1.0, 0.0, BORDER_REPLICATE); GMat gmatSobelY = cv::gapi::Sobel (gmatSmoothed, CV_32F, 0, 1, 3, 1.0, 0.0, BORDER_REPLICATE); GMat gmatMag = cv::gapi::add (cv::gapi::mul (gmatSobelX, gmatSobelX), cv::gapi::mul (gmatSobelY, gmatSobelY)); //GMat gmatAg = cv::gapi::phase (gmatSobelX, gmatSobelY, true); GComputation gcomputeMag (gmatSrc, gmatMag); //GComputation gcomputeAg (gmatSrc, gmatAg);

for (int i = 0 ; i < 4 ; i++)
{

  double d1 = clock ();

  GaussianBlur (matSrc, matSmoothed, Size (7, 7), 0, 0, BORDER_REPLICATE);
  Sobel (matSmoothed, matSobelX, CV_32F, 1, 0, 3, 1.0, 0.0, BORDER_REPLICATE);
  Sobel (matSmoothed, matSobelY, CV_32F, 0, 1, 3, 1.0, 0.0, BORDER_REPLICATE);
  add (matSobelX.mul (matSobelX), matSobelY.mul (matSobelY), matMag);
  phase (matSobelX, matSobelY, matAg, true);
  double d2 = clock ();

  gcomputeMag.apply (matSrc, matMag);
  //gcomputeAg.apply (matSrc, matAg);
  double d3 = clock ();

  cout << d2 - d1 << "ms, " << d3 - d2 << "ms" << endl;
}

return 0;

} 只做Magnitude,速度快了2倍左右 image

嗨,关于这部分有个疑问: d1到d2区间完成了梯度强度和梯度方向的计算,d2到d3只完成了梯度强度的计算,这样对比时间不太合理吧? 如果,d2到d3区间做梯度强度和梯度方向的计算,则时间比d1到d2区间完成梯度强度和梯度方向的计算更长(怀疑是因为SobelX,SobelY重复计算的原因)

zhirui-gao commented 3 years ago

hi,meiqua!想请教您一下,对于多个roi图片的识别,有没有一些离线的步骤,可以减少总的匹配时间?

meiqua commented 3 years ago

@zhirui-gao 如果ROI都确定而且重合较少,可以crop成不同的小块匹配