madmaze / pytesseract

A Python wrapper for Google Tesseract
Apache License 2.0
5.82k stars 721 forks source link

PyTesseract cannot read my number #538

Closed matheustecchio closed 8 months ago

matheustecchio commented 8 months ago

PROCESSED-2024-02-27

I'm trying to read this number with pyTesseract, but it can't read the number 1 in 351.

How can I fix it? All the value that I want are numbers.

stefan6419846 commented 8 months ago

Please provide the corresponding pytesseract code you are using. You might want to play with tessedit_char_whitelist and the PSMs (page segmentation modes).

Nevertheless, pytesseract will never have any direct influence on this, as it is just a wrapper around the Tesseract CLI itself. If Tesseract performs bad, pytesseract cannot do much about it. For this reason, your request seems to be out of scope here. You might want to try further examples or some different preprocessing which might perform better (or re-train a Tesseract model yourself).

matheustecchio commented 8 months ago

Hi Stefan, alright, thank you for your reply. The whitelist and PSMs didn't work so I find out that I probably will need to re-train my Tesseract model as the problem is the model itself. I'm closing this issue as is out of scope.