why can't it read the single character on the picture? - Githubissues

sirfz / tesserocr

A Python wrapper for the tesseract-ocr API

MIT License

1.99k stars 255 forks source link

why can't it read the single character on the picture? #273

Open umutozgur opened 2 years ago

umutozgur commented 2 years ago

why can't it read the single character on the picture?

ivanstepanovftw commented 9 months ago

Try this:

TESSDATA_PREFIX = "/usr/share/tesseract/tessdata"
tesserocr_languages = ["eng", "ara"]
api = PyTessBaseAPI(path=TESSDATA_PREFIX, lang="+".join(tesserocr_languages))

api.SetImageBytes(
    imagedata=pixmap.samples,
    width=pixmap.w,
    height=pixmap.h,
    bytes_per_pixel=bpp,
    bytes_per_line=pixmap.stride,
)
api.SetPageSegMode(tesserocr.PSM.SINGLE_CHAR)  # <- important
api.Recognize()
ocr_text = api.GetUTF8Text()

zdenop commented 9 months ago

without providing an input image you are alone with your problem...

ivanstepanovftw commented 5 months ago

Sure!

I think this issue should be closed as resolved.

If image contains single character, read this:

Tesseract works best on images which have a DPI of at least 300 dpi, so it may be beneficial to resize images

Here is also plot for you: