tleyden / open-ocr

Run your own OCR-as-a-Service using Tesseract and Docker
Apache License 2.0
1.33k stars 223 forks source link

Img URL not parsed from request body #32

Closed alexproca closed 9 years ago

alexproca commented 9 years ago

Hello,

I created a fig file which does the same as setup.sh but when I try to call the rest service it seems like the img_url is not passed correctly.

Fig file:

rabbitmq:
  image: tutum/rabbitmq
  dns: 8.8.8.8
  # ports:
  #     - "5672:5672"
  #     - "15672:15672"
  environment:
    - "RABBITMQ_PASS=1234"

openocr:
  image: tleyden5iwx/open-ocr
  dns: 8.8.8.8
  links:
    - rabbitmq
  ports:
    - "9292:9292"
  command: open-ocr-httpd -amqp_uri "amqp://admin:1234@rabbitmq/" -http_port 9292

openocrworker:
  image: tleyden5iwx/open-ocr
  dns: 8.8.8.8
  links:
    - rabbitmq
  command: open-ocr-worker -amqp_uri "amqp://admin:1234@rabbitmq/"

Curl request:

curl -v -X POST -H "Content-Type: application/json" -d '{"img_url":"http://bit.ly/ocrimage","engine":"tesseract"}' http://192.168.59.103:9292/ocr
* Hostname was NOT found in DNS cache
*   Trying 192.168.59.103...
* Connected to 192.168.59.103 (192.168.59.103) port 9292 (#0)
> POST /ocr HTTP/1.1
> User-Agent: curl/7.37.1
> Host: 192.168.59.103:9292
> Accept: */*
> Content-Type: application/json
> Content-Length: 57
>
* upload completely sent off: 57 out of 57 bytes
< HTTP/1.1 500 Internal Server Error
< Content-Type: text/plain; charset=utf-8
< Date: Sun, 25 Jan 2015 22:08:02 GMT
< Content-Length: 71
<
Unable to perform OCR decode.  Error: Timeout waiting for RPC response
* Connection #0 to host 192.168.59.103 left intact

Open OCR output:

Creating restocr_rabbitmq_1...
Creating restocr_openocr_1...
Creating restocr_openocrworker_1...
Attaching to restocr_rabbitmq_1, restocr_openocr_1, restocr_openocrworker_1
openocr_1       | 22:04:17.441712 OCR_HTTP: Starting listener on :9292
rabbitmq_1      | => Securing RabbitMQ with a preset password
rabbitmq_1      | => Done!
rabbitmq_1      | ========================================================================
rabbitmq_1      | You can now connect to this RabbitMQ server using, for example:
rabbitmq_1      |
rabbitmq_1      |     curl --user admin:1234 http://<host>:<port>/api/vhosts
rabbitmq_1      |
rabbitmq_1      | Please remember to change the above password as soon as possible!
rabbitmq_1      | ========================================================================
openocrworker_1 | 22:04:17.809227 OCR_WORKER: Creating new OCR Worker
openocrworker_1 | 22:04:17.809743 OCR_WORKER: Run() called...
openocrworker_1 | 22:04:17.809750 OCR_WORKER: dialing "amqp://admin:1234@rabbitmq/"
rabbitmq_1      |
rabbitmq_1      |               RabbitMQ 3.4.0. Copyright (C) 2007-2014 GoPivotal, Inc.
rabbitmq_1      |   ##  ##      Licensed under the MPL.  See http://www.rabbitmq.com/
rabbitmq_1      |   ##  ##
rabbitmq_1      |   ##########  Logs: /var/log/rabbitmq/rabbit@fa88f02be1ce.log
rabbitmq_1      |   ######  ##        /var/log/rabbitmq/rabbit@fa88f02be1ce-sasl.log
rabbitmq_1      |   ##########
rabbitmq_1      |               Starting broker... completed with 6 plugins.
openocr_1       | 22:06:01.212527 OCR_HTTP: serveHttp called
openocr_1       | 22:06:01.212827 OCR_CLIENT: dialing "amqp://admin:1234@rabbitmq/"
openocr_1       | 22:06:01.225323 OCR_CLIENT: callbackQueue name: amq.gen-tvP8DR0dvqGbTdVHnLenaA
openocr_1       | 22:06:01.225860 OCR_CLIENT: looping over deliveries..
openocr_1       | 22:06:02.480430 OCR_CLIENT: ocrRequest before: ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: []
openocr_1       | 22:06:02.480525 OCR_CLIENT: publishing with routing key "decode-ocr"
openocr_1       | 22:06:02.480538 OCR_CLIENT: ocrRequest after: ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: []
openocr_1       | 22:08:02.485886 ERROR: Timeout waiting for RPC response -- open-ocr.HandleOcrRequest() at ocr_http_handler.go:80
openocr_1       | 22:08:02.485944 ERROR: Unable to perform OCR decode.  Error: Timeout waiting for RPC response -- open-ocr.(*OcrHttpHandler).ServeHTTP() at ocr_http_handler.go:40
openocr_1       | 22:08:05.672308 OCR_HTTP: serveHttp called
openocr_1       | 22:08:05.672368 ERROR: EOF -- open-ocr.(*OcrHttpHandler).ServeHTTP() at ocr_http_handler.go:30
openocr_1       | 22:08:45.817174 OCR_HTTP: serveHttp called
openocr_1       | 22:08:45.817242 OCR_CLIENT: dialing "amqp://admin:1234@rabbitmq/"
openocr_1       | 22:08:45.826096 OCR_CLIENT: callbackQueue name: amq.gen-CS1047W2EPQv2it79aEgeA
openocr_1       | 22:08:45.826630 OCR_CLIENT: looping over deliveries..
openocr_1       | 22:08:46.501806 OCR_CLIENT: ocrRequest before: ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: []
openocr_1       | 22:08:46.501902 OCR_CLIENT: publishing with routing key "decode-ocr"
openocr_1       | 22:08:46.501905 OCR_CLIENT: ocrRequest after: ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: []
openocr_1       | 22:09:12.505743 OCR_HTTP: serveHttp called
openocr_1       | 22:09:12.505833 OCR_CLIENT: dialing "amqp://admin:1234@rabbitmq/"
openocr_1       | 22:09:12.511872 OCR_CLIENT: callbackQueue name: amq.gen-0xthgpXMHuCWnt9YKEPhKA
openocr_1       | 22:09:12.512488 OCR_CLIENT: looping over deliveries..
openocr_1       | 22:09:13.255830 OCR_CLIENT: ocrRequest before: ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: []
openocr_1       | 22:09:13.255890 OCR_CLIENT: publishing with routing key "decode-ocr"
openocr_1       | 22:09:13.255899 OCR_CLIENT: ocrRequest after: ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: []
openocr_1       | 22:10:46.511147 ERROR: Timeout waiting for RPC response -- open-ocr.HandleOcrRequest() at ocr_http_handler.go:80
openocr_1       | 22:10:46.511213 ERROR: Unable to perform OCR decode.  Error: Timeout waiting for RPC response -- open-ocr.(*OcrHttpHandler).ServeHTTP() at ocr_http_handler.go:40
openocr_1       | 22:11:13.261620 ERROR: Timeout waiting for RPC response -- open-ocr.HandleOcrRequest() at ocr_http_handler.go:80
openocr_1       | 22:11:13.261686 ERROR: Unable to perform OCR decode.  Error: Timeout waiting for RPC response -- open-ocr.(*OcrHttpHandler).ServeHTTP() at ocr_http_handler.go:40
openocr_1       | 22:13:35.549198 OCR_HTTP: serveHttp called
openocr_1       | 22:13:35.549294 OCR_CLIENT: dialing "amqp://admin:1234@rabbitmq/"
openocr_1       | 22:13:35.561996 OCR_CLIENT: callbackQueue name: amq.gen-UYpRwcD9xbPfGVnbEhhjcw
openocr_1       | 22:13:35.562443 OCR_CLIENT: looping over deliveries..
openocr_1       | 22:13:37.276163 OCR_CLIENT: ocrRequest before: ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: []
openocr_1       | 22:13:37.276358 OCR_CLIENT: publishing with routing key "decode-ocr"
openocr_1       | 22:13:37.276376 OCR_CLIENT: ocrRequest after: ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: []
openocr_1       | 22:15:37.283421 ERROR: Timeout waiting for RPC response -- open-ocr.HandleOcrRequest() at ocr_http_handler.go:80
openocr_1       | 22:15:37.283486 ERROR: Unable to perform OCR decode.  Error: Timeout waiting for RPC response -- open-ocr.(*OcrHttpHandler).ServeHTTP() at ocr_http_handler.go:40

It seems like the img_url is not parsed from request body

openocr_1 | 22:09:13.255899 OCR_CLIENT: ocrRequest after: ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: []

tleyden commented 9 years ago

You should see something like the following output in your OCR_WORKER logs:

23:58:02.389856 OCR_WORKER: Creating new OCR Worker
23:58:02.390277 OCR_WORKER: Run() called...
23:58:02.390316 OCR_WORKER: dialing "amqp://xxx:yyy@162.222.178.49/"
23:58:17.427673 OCR_WORKER: got Connection, getting Channel
23:58:17.455394 OCR_WORKER: binding to: decode-ocr

and the fact that you are not seeing this tells me that the OCR_WORKER is either connecting to the wrong rabbitmq uri, or is trying to connect before its launched.

To test a workaround, can you add some "sleeps" somehow in your fig config for the OCR_WORKER? Make it sleep for 30s before trying to startup. I guess the long term fix might be to have a timeout and a retry loop in the go code when it connects to rabbit.

I think the log message "OCR_CLIENT: ocrRequest before: ImgUrl: ," is misleading, lets ignore that for now.

tleyden commented 9 years ago

Once we get this working, I'll add the fig file to the repo/docs. Thanks for contributing that!

tleyden commented 9 years ago

and the fact that you are not seeing this tells me

I mean, the fact that you are not seeing these two lines:

23:58:17.427673 OCR_WORKER: got Connection, getting Channel
23:58:17.455394 OCR_WORKER: binding to: decode-ocr
alexproca commented 9 years ago

I can add command in worker command: sleep 30;open-ocr-worker -amqp_uri ...

alexproca commented 9 years ago

The 30 seconds delay was the problem. I figured it out and made a pull reqest #33 with fig configuration

tleyden commented 9 years ago

Thanks!

tleyden commented 9 years ago

I added a ticket to address the retry issue: https://github.com/tleyden/open-ocr/issues/34, at which point we can remove the ugly workaround.

Also, do you have any instructions on how to use the fig files? (I've never used fig, but have heard a lot about it).

alexproca commented 9 years ago

I will write in readme how to start with open-ocr with fig but first I would like to add stroke width preprocessor into fig configuration.

tleyden commented 9 years ago

Ok great