liuliu / ccv

C-based/Cached/Core Computer Vision Library, A Modern Computer Vision Library
http://libccv.org
Other
7.08k stars 1.72k forks source link

ccv_sobel() speed improvements #32

Closed mledford closed 11 years ago

mledford commented 12 years ago

While profiling some of the different built in functions I noticed that ccv_sobel() appears to consume a significant amount of time particularly on iOS devices. I would assume the best way to attempt to improve the performance would be to use SIMD instructions, like NEON for ARM or MMX for Intel. However given the way the current function is implemented it seems like it could become messy and a non-trivial task. Is this as good as it gets or is there a way to get performance up cleanly?

liuliu commented 12 years ago

I would start to implement a _ccv_sobel_8u, and dispatch to that function when certain data type criteria meets. Which utilities you use? ccv's design heavily depends on compiler's optimization techniques, could you elaborate where you encountered this performance hit? I would love to implement some SIMD improvements after finish what I have in hand (tracking algorithm, and text detection improvement), but would expect minor performance improvement (1.2~1.5x). Have you enabled application wide cache? if you use ccv_swt, that would have noticeable performance improvement (ccv_swt runs twice (from dark to light and from light to dark), and will run the same ccv_sobel twice).

mledford commented 12 years ago

Indeed I was using ssv_swt(). I have profiled with and without the cache enabled and performance of the same path is nearly identical for a single pass through regardless. Could this be more an indication that cache is broken? I'll do some more testing to see what I find.

mledford commented 12 years ago

After further debugging what I have found is the cache system breaks with Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn) when anything other than the -O0 optimization flag is used. This compiler is the version included with Xcode 4.5.1. What appears to happen is that items are not found in the cache and so new items are allocated.

liuliu commented 12 years ago

if you call ccv_swt_detect_words, it will run ccv_swt from two directions (two pass), and the second pass will reuse sobel/canny result. If you run ccv_swt directly, it is only one pass, and won't retrieve any result.

liuliu commented 12 years ago

can you run test suite with said compiler? (make test in ccv/test/)

mledford commented 12 years ago

I have run the test suite. The caveat here is that it is being run on a machine with an Intel processor. Why do I mention this? I forgot to add a bit of information in my comment a few days ago and that the optimization flag breakage appears to only occur when running on an iOS device. Currently all Apple iOS devices, as I'm sure you are aware, are ARM based.

To complicate things more when I run the test suite on Intel I get inconsistent results. Let me show you what I mean:

Run 1:

... all test case(s) passed, congratulations! all test case(s) passed, congratulations! /bin/sh: line 1: 75394 Segmentation fault: 11 ./"$test" all test case(s) passed, congratulations! all test case(s) passed, congratulations! /bin/sh: line 1: 75398 Segmentation fault: 11 ./"$test" make[1]: *\ [test] Error 139 ... all test case(s) passed, congratulations! ...

Run 2:

... /bin/sh: line 1: 75608 Segmentation fault: 11 ./"$test" all test case(s) passed, congratulations! /bin/sh: line 1: 75616 Segmentation fault: 11 ./"$test" all test case(s) passed, congratulations! all test case(s) passed, congratulations! all test case(s) passed, congratulations! ... /bin/sh: line 1: 75659 Segmentation fault: 11 ./"$test" make[1]: *\ [test] Error 139 ...

liuliu commented 12 years ago

@mledford, thanks, I will investigate the iOS issue this week. I tried to run test on ARM device (namely, Raspberry Pi), and it seems fine, but these are all Linux OS, things may be wildly different on *BSD based system.

mledford commented 11 years ago

@liuliu I still don't have a firm grasp of the caching system. All the inline functions really obscure the workings to me. A question with regards to ccv_swt_detect_words() and caching. ccv_sobel() is called eight times. If the caching is enabled it seems to me that it should be called eight times but the function should only run two times in its entirety. I am showing that it is called eight times and running six times instead of two. There is definitely something strange going on with the compiler optimizations because when I placed a printf at the top of ccv_sobel() to print how many times it is called I show the expected eight calls and then I show the it running a total of four times.

Am I wrong in my expectations? Shouldn't I expect to only see ccv_sobel() do its computations twice when running ccv_swt_detect_words()?

liuliu commented 11 years ago

You are right about the expectation (called 8 times (2 times in ccv_canny, 2 times in ccv_swt, and then run it in opposite direction), ran entirely for 2 times). Sorry for the delay, I am going to test ccv on iOS but got delayed for 0.3 release. Also, the tests doesn't run on Mac OSX because it uses a Linux specific hack (cat /proc/self/maps). Also, depends on how large you assigned the cache to, previous results may be trimmed from the cache (the default size is 64M, which should be enough for a normal image, unless you are running with full size image on iOS).

mledford commented 11 years ago

No worries on not testing yet. I just decided to look at the issue again to see if I could determine what is happening. No luck yet.

The size of the cache shouldn't be an issue. I am currently using the default cache size, which as you mentioned is 64MB. The images I'm throwing at it are 480x360 single component which should be a few hundred kilobytes.

Also, when you throw it on iOS you might do a static analysis. A good number of problems crop up. Most of them appear to be edge cases so I'm cautious about trying to fix them.