Closed ddddavidmartin closed 6 years ago
Not so simple, unfortunately.
With Tesseract 3.03:
$ tesseract --psm 9
Tesseract Open Source OCR Engine v3.03 with Leptonica
Cannot open input file: --psm
AFAIK, Tesseract 4 is still not stable (and not available in Debian stable anyway). Therefore support for all versions of Tesseract > 3 must be maintained.
The best way to work around that problem would be to call get_version()
and decide what to do based on the version of Tesseract. Similar fixes already exist in can_detect_orientation()
and detect_orientation()
.
Ah yes, that makes sense. I shall have a look and update the pull request. Thanks for checking!
Thanks :)
Also the Python2 tests are broken because of a dependency loop in the modules (tesseract
tries to import builder
who tries to import tesseract.psm_parameter
) ...
(I should have run the tests before merging ...)
you can run the tests and the checks with make test
(requires tox) and make check
(requires pyflake8)
Actually, Python 3 tests are broken too.
Oh no! Can you revert the merge on master and I'll update the pull request again?
I had not actually tried out the latest commit yet as I'm developing on a different machine from which I have my document scans running.
Reverted: 7189c6980ba8bcbb0249ccb03495f0664b709b23
By the way, Pyocr tests are unfortunately unreliable: OCR output differ too much from one system to another. --> you can (and must) run the tests to make sure that Pyocr seems to work, but in any case you will always have failed tests. You will have to look at the error messages: If PyOCR works, error messages will show that the tests failed due to the exact content returned by the OCR.
That's something I'll have to fix later. :/
Actually, the code style errors returned by make check
are my mistake. I'm fixing them right now.
This recently changed in the official tesseract engine [0].
-psm
is not allowed as an option anymore and--psm
has to be used instead.This fixes https://github.com/openpaperwork/pyocr/issues/99.
Thanks!
[0] https://github.com/tesseract-ocr/tesseract/commit/ee201e1f4fa277a4b2ecd751a45d3bf1eba6dfdb