openpaperwork / pyocr

A Python wrapper for Tesseract and Cuneiform -- Moved to Gnome's Gitlab
https://gitlab.gnome.org/World/OpenPaperwork/pyocr
931 stars 152 forks source link

Tesseract 4.0 alpha support #60

Closed wanghaisheng closed 7 years ago

wanghaisheng commented 7 years ago

is there any plan support Tesseract 4.0 alpha

jflesch commented 7 years ago

I haven't checked yet Tesseract 4. Are they any notable difference with Tesseract 3 from an usage point of view ? oO

ddddavidmartin commented 7 years ago

AFAIK it has 2+ years of additional development and I hope this results in better OCR results.

I have opened a pull request to at least allow to run it without throwing an exception in https://github.com/openpaperwork/pyocr/pull/66. It seems to be working fine with paperless from what I can tell.

jflesch commented 7 years ago

66 merged

ddddavidmartin commented 7 years ago

Hi @jflesch, I don't think this issue should be closed yet, no? #66 let's pyocr run with tesseract 4.0 but at least I did not check whether there were any compatibility issues.

jflesch commented 7 years ago

Yep, I just tested, and with all the languages installed + Tesseract 4, the libtesseract support segfaults

jflesch commented 7 years ago

Nevermind. I was stil working with Libtesseract 3. I'm adding support for libtesseract 4.

jflesch commented 7 years ago

291624d464e56048ac77e41312fc0bc3265bdb31

jflesch commented 7 years ago

Included in Pyocr 0.4.7

ddddavidmartin commented 7 years ago

Thumbs up @jflesch! That was very quick indeed. :)

I have to correct myself though:

AFAIK it has 2+ years of additional development and I hope this results in better OCR results.

Only the version in the package manager on my oldish ubuntu is years old. The last release was only in February [0] so it may not be that big a difference. Still it is good to be able to just compile from the master branch and use tesseract with pyocr.

[0]

The latest stable version is 3.05.00, released in February 2017.