tleyden / open-ocr

Run your own OCR-as-a-Service using Tesseract and Docker
Apache License 2.0
1.33k stars 223 forks source link

engine_args is ignored when specify preprocessor #51

Open yazin opened 8 years ago

yazin commented 8 years ago

I try to use preproceccors. I post request like following:

curl -X 8080 -H "Content-Type: application/json" -d '{"img_url":"http://foo.bar/baz.jpg", "engine":"tesseract", "preprocessors": ["stroke-width-transform"], "engine_args":{"lang":"jpn"}}' http://10.0.2.15:8080/ocr

But, result I got is invalid, it look like English traineddata is used for Japanese text.

win u.-

63 ll}? "

2013ffin238 4:32

u n.on
HI ‘ ID“-
~' In v11. 50
E TC 5 V250
‘é‘fi ¥11. 400}?
I 3 "IL Yucca
I '  «12732
M515. :hbun‘cac'a‘w "a.
as-‘n'imagurx

Bea-fifiw
\I

Rm

909 uy9

s oa-asna-oaoo

v

‘ t

"(mi

In server log:

 00:56:05.939277 OCR_TESSERACT: cmdArgs: [/tmp/4ff50eb5-c8db-47b8-651f-293becc5641e /tmp/4ff50eb5-c8db-47b8-651f-293becc5641e]

There is no -l jpn args.

Then I post request like following:

curl -X 8080 -H "Content-Type: application/json" -d '{"img_url":"http://foo.bar/baz.jpg", "engine":"tesseract", "engine_args":{"lang":"jpn"}}' http://10.0.2.15:8080/ocr

Result seems good.

*g image 口 因

領収 書

犯ー3 彙m 月 惣日 捌
メー夕丶遣賃 \ーL伽 円
遠距離 創引 一 \2ー0 円
固定迎車料金 + \伽 円

遣賃料金計 \ーー'ー印 円
ETc 料金 + \剛円

合計 \ーL 400円

現 金 支払 \ーー,伽円

車輌番号 m2732

毎盧こ`乗車ぁりがとうごさいます〟
ぉ忘れ物は当杜へ

日興 動車鎚爛

ご要望は当社又は

(財凍京タクシ一セン夕一

I think engine_args is ignored when specify preprocessor.

I use OpenOCR with Docker-Compose on Ubuntu 14.04.

tleyden commented 8 years ago

Sorry for the delay, and thanks for posting this! I think it's probably a valid bug.