Closed DamonsJ closed 5 years ago
There is currently no text/image classification in ocropus, also this was discussed before and implemented in a previous version, see #38. The different columns in a table might be detected (especially if there are given black separators between them). Nothing about detection of mathematical formulas.
Not part of ocropus, ocropus does line detection with a few heuristics / knobs to turn to avoid lines bleeding across columns or inadvertently merging lines etc.
Have a look at dhSegment, for a semi-automatic solution check out LAREX (they are working on a trainable pixel classifiier as well IIRC) or the Leptonica toolset which tesseract uses.
For completeness sake or the curious: https://github.com/tmbdev/ocropy/wiki/OCRopus-File-Formats#physical-layout
suppose a document page contains text ,math equation and image, can ocropy identify which block is text, which block is math equation, which block is image?
may be there is table in document page ? Is there any solution?
Thanks