Support non-english languages

tleyden / open-ocr

Run your own OCR-as-a-Service using Tesseract and Docker

Apache License 2.0

1.33k stars 223 forks source link

Support non-english languages #13

Closed tleyden closed 10 years ago

tleyden commented 10 years ago

Update OpenOCR to support non-english languages

tleyden commented 10 years ago

The code is done and pushed to github, still waiting for the docker images to update.

Example using Japanese:

curl -X POST -H "Content-Type: application/json" -d '{"img_url":"http://cl.ly/image/0U2Y461J2p2K/Screen%20Shot%202014-09-06%20at%2010.36.40%20AM.png","engine":"tesseract", "engine_args":{"lang":"jpn"} }' http://yourserver.com/ocr
どうしてる?

tleyden commented 10 years ago

The list of supported languages is documented in the JSON schema here: http://docs.openocr.apiary.io/

"lang":{  
               "description":"The language to use.  If omitted, will use English",
               "enum":[  
                  "eng",
                  "ara",
                  "bel",
                  "ben",
                  "bul",
                  "ces",
                  "dan",
                  "deu",
                  "ell",
                  "fin",
                  "fra",
                  "heb",
                  "hin",
                  "ind",
                  "isl",
                  "ita",
                  "jpn",
                  "kor",
                  "nld",
                  "nor",
                  "pol",
                  "por",
                  "ron",
                  "rus",
                  "spa",
                  "swe",
                  "tha",
                  "tur",
                  "ukr",
                  "vie",
                  "chi-sim",
                  "chi-tra"
               ]

I arbitrarily picked the most common languages, but if yours isn't listed it can be added fairly easily.

tleyden commented 10 years ago

The docker image has been re-built, so this is ready to use.

Run docker pull tleyden5iwx/open-ocr to get the latest, and then restart your services.

itchat commented 5 years ago

Seems like "chi-sim" and "chi-tra" won't work, it returned me "Error processing image url: . Error: exit status 1".

tleyden commented 5 years ago

@itchat can you open a separate ticket with steps to reproduce?