Open jflesch opened 7 years ago
Getting occasional segfaults when using the pyocr.libtesseract
tool. Can't pinpoint an exact repeatable cause. Will update if a pattern that triggers the segfault is found.
The other segfault occurs when there is no language data. This one is consistent.
If you find a pattern, that would be awesome :-)
I note for the no-language crash. I'll have a look asap (probably this week-end I hope).
BTW, can you tell me which version of Tesseract you use please ?
no-language crash:
Tesseract version is 3.04.01 from Ubuntu's 3.04.01-4build1
Thanks for the fix.
We lowered Mayan EDMS (http://www.mayan-edms.com) memory footprint by switching to pyocr's libtesseract, thanks for that too :)
You're welcome :)
Hm, maybe the crashes were due to a hack:
TessBaseAPIDetectOS()
was actually a C++ function. I was using ctypes to access it .. and let just say it's not designed for C++, so it is/was a bit hacky. It may have been the cause of crashes on some systems.
Tesseract 3.05.00 included a new replacement function TessBaseAPIDetectOrientationScript()
that is pure C. @aszlig added support for this new function.
I think I will try to switch libtesseract back as default once Tesseract 4 is out.
Someone has been reporting crashes of Paperwork when running the OCR. They are using Tesseract 3.04.01 .. so there may be something wrong with the libtesseract binding.
(Note: currently, the preference order has been changed so Pyocr uses tesseract-sh if possible)