openpaperwork / pyocr

A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab
https://gitlab.gnome.org/World/OpenPaperwork/pyocr
931 stars 152 forks source link

Use '--psm' instead of '-psm' as the option was deprecated. #100

Closed ddddavidmartin closed 6 years ago

ddddavidmartin commented 6 years ago

This recently changed in the official tesseract engine [0]. -psm is not allowed as an option anymore and --psm has to be used instead.

This fixes https://github.com/openpaperwork/pyocr/issues/99.

Thanks!

[0] https://github.com/tesseract-ocr/tesseract/commit/ee201e1f4fa277a4b2ecd751a45d3bf1eba6dfdb

jflesch commented 6 years ago

Not so simple, unfortunately.

With Tesseract 3.03:

$ tesseract --psm 9
Tesseract Open Source OCR Engine v3.03 with Leptonica
Cannot open input file: --psm

AFAIK, Tesseract 4 is still not stable (and not available in Debian stable anyway). Therefore support for all versions of Tesseract > 3 must be maintained. The best way to work around that problem would be to call get_version() and decide what to do based on the version of Tesseract. Similar fixes already exist in can_detect_orientation() and detect_orientation().

ddddavidmartin commented 6 years ago

Ah yes, that makes sense. I shall have a look and update the pull request. Thanks for checking!

jflesch commented 6 years ago

Thanks :)

jflesch commented 6 years ago

https://origami.openpaper.work/#/builders/2/builds/459

jflesch commented 6 years ago

Also the Python2 tests are broken because of a dependency loop in the modules (tesseract tries to import builder who tries to import tesseract.psm_parameter) ...

jflesch commented 6 years ago

(I should have run the tests before merging ...)

jflesch commented 6 years ago

you can run the tests and the checks with make test (requires tox) and make check (requires pyflake8)

jflesch commented 6 years ago

Actually, Python 3 tests are broken too.

ddddavidmartin commented 6 years ago

Oh no! Can you revert the merge on master and I'll update the pull request again?

I had not actually tried out the latest commit yet as I'm developing on a different machine from which I have my document scans running.

jflesch commented 6 years ago

Reverted: 7189c6980ba8bcbb0249ccb03495f0664b709b23

jflesch commented 6 years ago

By the way, Pyocr tests are unfortunately unreliable: OCR output differ too much from one system to another. --> you can (and must) run the tests to make sure that Pyocr seems to work, but in any case you will always have failed tests. You will have to look at the error messages: If PyOCR works, error messages will show that the tests failed due to the exact content returned by the OCR.

That's something I'll have to fix later. :/

jflesch commented 6 years ago

Actually, the code style errors returned by make check are my mistake. I'm fixing them right now.