tleyden / open-ocr

Run your own OCR-as-a-Service using Tesseract and Docker
Apache License 2.0
1.33k stars 223 forks source link

strokewidthtransform container craches using PDF and "stroke-width-transform" #127

Open cbleek opened 4 years ago

cbleek commented 4 years ago

convert-pdf still not work.

I've started the container via run.sh un ubuntu 18.04

Executing the following leads to a crashing strokewidthtransform_1 container.

cbleek@xenon:~$ curl -X POST -H "Content-Type: application/json" -d '{"img_url":"https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf","engine":"tesseract", "preprocessors":["stroke-width-transform"]}' http://ocr.cross-solution.de:9292/ocr
curl: (52) Empty reply from server
rabbitmq_1              | accepting AMQP connection <0.1887.0> (172.18.0.5:37308 -> 172.18.0.2:5672)
openocr_1               | 14:06:17.204094 OCR_CLIENT: callbackQueue name: amq.gen-OEXi9KFH3RfAbxlzIh10hg
openocr_1               | 14:06:17.204595 OCR_CLIENT: looping over deliveries..
openocr_1               | 14:06:17.761540 OCR_CLIENT: ocrRequest before: ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: [stroke-width-transform]
openocr_1               | 14:06:17.761604 OCR_CLIENT: publishing with routing key "stroke-width-transform"
openocr_1               | 14:06:17.761622 OCR_CLIENT: ocrRequest after: ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: []
strokewidthtransform_1  | 14:06:17.762808 PREPROCESSOR_WORKER: got 17830 byte delivery: [1]. Routing key: stroke-width-transform Reply to: amq.gen-OEXi9KFH3RfAbxlzIh10hg
strokewidthtransform_1  | 14:06:17.763042 PREPROCESSOR_WORKER: ocrRequest before: ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: []
strokewidthtransform_1  | 14:06:17.763050 PREPROCESSOR_WORKER: publishing with routing key "decode-ocr"
strokewidthtransform_1  | 14:06:17.763053 PREPROCESSOR_WORKER: ocrRequest after: ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: []
strokewidthtransform_1  | 14:06:17.763062 PREPROCESSOR_WORKER: Preproces ImgUrl: , EngineType: ENGINE_TESSERACT, Preprocessors: [] via stroke-width-transform
strokewidthtransform_1  | 14:06:17.763231 PREPROCESSOR_WORKER: extract dark on light param
strokewidthtransform_1  | 14:06:17.763236 PREPROCESSOR_WORKER: return val: 1
strokewidthtransform_1  | 14:06:17.763239 PREPROCESSOR_WORKER: DetectText on /tmp/ffee701d-2ef6-430b-564b-329c3d9405ec.png -> /tmp/7ea05542-c04a-46d2-7da8-99e9819a92b5.png with 1
strokewidthtransform_1  | 14:06:17.800145 FATAL: Error running command: exit status 255.  out: couldn't load query image
strokewidthtransform_1  |  -- open-ocr.StrokeWidthTransformer.preprocess() at stroke_width_transform.go:54