tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
59.53k stars 9.23k forks source link

Differences in image contrast, brightness, and sharpness can lead to different directions of ocr recognition #4265

Closed intelligence66 closed 3 weeks ago

intelligence66 commented 3 weeks ago

Your Feature Request

roi_packages1_8_process roi_packages1_8_process.txt roi_packages1_8_process2.txt roi_packages1_8_processe2 这两张图片的对比度、亮度都不同,txt文件为相同后缀名图片ocr提取的内容,发现ocr提取的顺序完全不相同,这是什么原因造成的,该如何避免这样的问题,获得更优质的ocr文本提取

zdenop commented 3 weeks ago

Please use English. Make sure you read documentation.

stweil commented 3 weeks ago

Translation: " The contrast and brightness of these two images are different, the txt file is ocr-extracted for the same suffix image, and the ocr-extracted order is found to be completely different, what is the reason for this, and how can we avoid such a problem and get better ocr-text extraction?"

stweil commented 3 weeks ago

This is a well known fact of OCR. Preprocessing of images can be crucial for good recognition results. But GitHub issues are neither a help forum nor a support hotline. Please use the Tesseract user forum for questions.