Open NG-Corp opened 2 weeks ago
it really depends on the usecase,
in my experience i discovered this library while trying to search for a way to extract if a text is bold italic and the font name, i found out that such thing exists but only in tesseract's C++,
So tesserocr comes to place here, it's a sort of a API wrapper to use the Tesseract C++ advanced functions that are not existing in the python's tesseract.
So back to your question, if pytesseract is sufficient for your usecase i would recommend using it directly as you don't need any sort of wrapper and the library is directly built on python, and it would help you understand more the tesseract engine and with practice you could answer pretty much any OCR usecase, but tesserocr comes as an alternative not to pytesseract but more specifically for tesseract's C++ library to use it in python.
I hope that I provided a clear answer for you case.
If you're installing a binary wheel (most likely scenario on Linux/macosx), it comes bundled with tesseract and all required libs so no need for any extra dependencies. If you build it yourself (e.g. pip install --no-binary tesserocr tesserocr
), then you need to make sure you have libtesseract and all requirements installed in your environment (as per the README)
Just a quick question, does tesserocr require a hard tesseract download? I was just thinking of using it in my api as an alternative to pytesseract because apparently it does. The README doesn't really give clear directions on that part...