sirfz / tesserocr

A Python wrapper for the tesseract-ocr API
MIT License
1.99k stars 255 forks source link

Simple Question #355

Open NG-Corp opened 2 weeks ago

NG-Corp commented 2 weeks ago

Just a quick question, does tesserocr require a hard tesseract download? I was just thinking of using it in my api as an alternative to pytesseract because apparently it does. The README doesn't really give clear directions on that part...

metouitude commented 1 week ago

it really depends on the usecase,

in my experience i discovered this library while trying to search for a way to extract if a text is bold italic and the font name, i found out that such thing exists but only in tesseract's C++,

So tesserocr comes to place here, it's a sort of a API wrapper to use the Tesseract C++ advanced functions that are not existing in the python's tesseract.

So back to your question, if pytesseract is sufficient for your usecase i would recommend using it directly as you don't need any sort of wrapper and the library is directly built on python, and it would help you understand more the tesseract engine and with practice you could answer pretty much any OCR usecase, but tesserocr comes as an alternative not to pytesseract but more specifically for tesseract's C++ library to use it in python.

I hope that I provided a clear answer for you case.

sirfz commented 1 week ago

If you're installing a binary wheel (most likely scenario on Linux/macosx), it comes bundled with tesseract and all required libs so no need for any extra dependencies. If you build it yourself (e.g. pip install --no-binary tesserocr tesserocr), then you need to make sure you have libtesseract and all requirements installed in your environment (as per the README)