Channel parallelization

krautz commented 5 years ago

Good morning, Liu Liu. I am a student and researcher of University of Campinas (UNICAMP) in campinas, Brazil.

In my latest project we worked with you ccv lib. The main purpose of the project was accelerating the execution time. In the study our main 'weapon' was the usage of parallel routines provided by open mp.

With that said, we found 2 hotspots for optimizing your code, they are: 1 - Parallelize the execution of ccv_swt (called in ccv_detect_words) for each channel (dark_to_bright and bright_to_dark). This reasulted in a good speedup! 2 - The second optimization was based on canny filter. We noticed that ccv_canny was called inside ccv_swt and that ccv_canny called ccv_sobel inside it. But ccv_sobel was also calculated in ccv_swt! So we moved the calculation of ccv_sobel in ccv_swt to before ccv_canny and created a new function based on ccv_canny called ccv_canny_no_sobel which is the same as ccv_canny but receives as input the result of a sobbel execution and don't calculate it again inside it! This also generated a good speedup!

This 2 optimizations can be found on a fork that I did of your branch. It can be found on https://github.com/krautz/ccv

If you like our research project, please feel free to let me send you a merge request with this optimizations!

Thank you! Caio Krauthamer

liuliu commented 5 years ago

Thanks! The OpenMP enabled SWT part looks cool.

Regarding the Sobel operator:

It is intentional. See: http://libccv.org/doc/doc-cache/

In short, ccv relies on an internal cache to dedup internal operations. Thus, it is alright (you can try by calling ccv_enable_cache() first in your main function). and the ccv_sobel operation (as long as the parameters are the same) should be applied only once. The next call in ccv_swt will actually just reuse the result from previous call.

krautz commented 5 years ago

Interesting!

Didn't know about this function. As a result of this, I will leave only the channel parallelization part on this pull request.

Giving a little more depth in the study of prallelization: We were researching alongside with SRBR (Samsung Research Brazil) on technologies to detect text with a phone camera. We had the need of an open source lib to do it and we have choosen yours and OpenCV as well. For the recognition part we have used Tesseract.

On optimizations, we focoused on loops that were do all, that is, loops that an iteration i didn't needed any value computated in any previous iteration. And also on structures that could be run in parallel since they didn't share any write-read variables in common. One important thing to notice is that if a small computational power is needed to execute a loop, then parallelizing it may not be beneficial, because there is a time overhead related to creating and then syncing and destroying the threads.

We tested the parallelization of the sobel aplication filter, which didn't brought a good speed up (and couldn't be parallelized along with ccv_swt parallelization, since it would be necessary 6 threads).
We parallelized a loop at the end of ccv_detect_words which iterates over a variable called textlline. This parallelization didn't bring any benefit nor malice to the execution time. Since in our images only a small portion of boxes were detected we decided not to parallelize this loop (NOTE: if the numbers of boxes generated are really great, then this parallelization may bring some benefits).
Finally, we parallelized the call of ccv_swt which brought the best benefit to the execution time!

liuliu commented 5 years ago

Thank you for the explanation! Have you guys tried something more modern like https://github.com/princewang1994/TextSnake.pytorch? It seems to me TextSnake would give you better result (accuracy wise) than SWT.

krautz commented 5 years ago

So, we had one major drawback: As the project was funded by samsung, they had one goal with this project which is, unfortunately, confidential. We had a hardware limitation for that, so GPU routines could not be used and power consumption had to be low. That is why we focused the study on libs that took advantage of CPU power. Additionally, they prefered libs that were open source. With all this restrictions we, then, centered our study on libccv and OpenCV.

liuliu commented 5 years ago

Thanks. That makes a lot of sense. libccv is in the process of getting neural-fied, it is interesting to hear alternative opinions. I will try to look into this PR in the next a few days (upgrading the build / work machine in the coming days).

krautz commented 5 years ago

Good evening, Liu Liu, as our project is approaching its end on February, do you have any update on this pull request?

liuliu commented 5 years ago

Did a rewrite of the proposed solution, and committed in https://github.com/liuliu/ccv/commit/bcc0d0d94fc960971037bed1b7dc4a6e71cf4af0

liuliu / ccv

Channel parallelization #216