Closed ldenoue closed 5 years ago
Dear ldenoue,
it may depend on the environment on which you planned to execute the CC algorithm. You can take a look at this paper to see some results. On page 9 and 15 there are tables resuming performance of different algorithms on different environment and datasets: XDOCS, Tobacco800 and Hamlet are text document datasets. In general, BBDT and DRAG (that has been released after the publication of the paper I linked to you) are the best performing algorithms under Windows on document images.
Under Linux CTB and BBDT are usually the best performing algorithms and they have very similar performance. If you are able to compile DRAG with -O1 optimization then it will be the best performing algorithm under Linux.
BBDT algorithm is implemented in the OpenCV library since version 3.2.
Thanks for your explanation. I’ll start with the opencv implementation that you reference. For now I’m using this method, implemented in JavaScript. I’m guessing it’s slower than any of the methods you’ve talked about in your answer, correct? https://github.com/bramp/Connected-component-labelling
Yes, correct. The contour tracing technique is quite old and slow.
Thank you 🙏
Please keep in mind that you can test the technique you linked performance with YACCLAB, since it is implemented as CT algorithm in the file labeling_fchang_2003.h
.
If you are going to use JavaScript, the performance figure may be extremely different from those on C++, because it is an interpreted language.
do you know the fastest method for computing the CCs on a binary image of text? (Like à binarized pdf page of a research paper),