Closed easyeda2021 closed 9 months ago
There are multiple intersecting reasons why these particular images perform poorly, however all are issues with the Tesseract OCR engine rather than Tesseract.js, so fixing would be outside of the scope of this repo.
For context, Tesseract.js is the Javascript/Webassembly port of Tesseract. We do not make any edits to the recognition engine, so any accuracy issues with the Tesseract engine are outside of the scope of this project. Therefore, if you would like to pursue further, you should consult the documentation and discussion for the main Tesseract project. You may find that there are configuration settings that may help to achieve better results.
If you do not find settings that improve recognition, and believe this constitutes a (previously unreported) bug, then you should replicate the issue using the main (CLI) Tesseract project program and raise the issue with that project.
Edit: My first bullet point was partially incorrect. When run in PSM mode AUTO
(3
) Tesseract can create multiple blocks per page, and text orientation is detected on the block level. Therefore, it is theoretically possible for horizontal and vertical text to be detected on the same page in this mode. However, in my experience enabling PSM AUTO
does not work particularly well, and often results in words being categorized as noise and deleted. Therefore, I doubt that changing to PSM AUTO
will solve this particular issue.
Hi Balearica thank you for your reply, I got it, we will check this issue if Tesseract project met it thanks
Tesseract.js version (version number for npm/GitHub release, or specific commit for repo) v5.0.3 Describe the bug as the image for example, page 1 https://atta.szlcsc.com/upload/public/pdf/source/20151029/1457707509740.pdf
miss texts:
To Reproduce Steps to reproduce the behavior: take the screeshot, and then import to Tesseract
Please attach any input image required to replicate this behavior.
Expected behavior support recognize four directions texts and correctly
Device Version:
Additional context no
thank you for the nice job