Open serhii-eleks opened 5 years ago
I found config argument: "tessedit_create_hocr": "1" in order to return data in hocr format.
Do the docs need to be improved?
Yes, would be great to improve docs available output formats. But the main issue is the system cannot execute tesseract with "--psm" parameter.
Hello @tleyden
Do you have any updates?
Thanks.
Hello! We're having the exact same problem. We would like to launch tesseract with the psm:3 parameter but we fail to do so for tesseract 4.0.
The problem seems to be in this line
https://github.com/tleyden/open-ocr/blob/1cd43c1659c42dd65487559e9f055436c25b0e06/tesseract_engine.go#L87
we managed to fix it only for tesseract 4.0 by changing it in
result = append(result, "--psm")
probably it's needed to switch between the 2 cases to make the change backward compatible.
we managed to fix it only for tesseract 4.0 by changing it in
result = append(result, "--psm")
Could you please say where exactly I need to replace it? I went into the docker container of the worker and httpd and located the mentioned file, changed it and restarted both containers. Error is still the same.
There is a workaround solution for this issue..
Get into OCR worker container
You should be able to list your running docker container with command below
docker ps
List should contain dockercompose_openocr_1, dockercompose_openocrworker_1, dockercompose_strokewidthtransform_1 and dockercompose_rabbitmq_1 in NAMES column which corresponding to HTTP handler, OCR worker, pre-processor and RabbitMQ.
Use command below to get into OCR worker
container
docker exec -it <container_id> /bin/bash
Refactor the source code
cd /opt/go/src/github.com/tleyden/open-ocr/
vim tesseract_engine.go
Around line 87:
Change result = append(result, "-psm")
into result = append(result, "--psm")
Recompiling execution file
cd /opt/go/src/github.com/tleyden/open-ocr/cli-worker && go build -v -o open-ocr-worker && cp open-ocr-worker /usr/bin
If you encountered the message below:
cp: cannot create regular file '/usr/bin/open-ocr-worker': Text file busy
you may try to restart the container and try cp it again
Restart docker container
Exit docker container and restart it by using command below:
docker restart <container_id>
Hello,
I'm trying to launch your environment with tleyden5iwx/open-ocr-2 image. This image should contain Tesseract 4.0. Looks like decoding image/pdf using psm argument doesn't work.
Request Body: { "img_url": "http://bit.ly/ocrimage", "engine": "tesseract", "engine_args": { "config_vars": { "tessedit_char_whitelist": "0123456789" }, "psm": "3" } }
Reponse: Error processing image url: . Error: exit status 1
In Tesseract 3.* psm argument use one "-psm", in Tesseract 4.0 two "--psm". I think this is the main issue.
By the way, can you create one addition argument where I can control the output. Not only raw text. I want to receive text in *.hocr format too. And any other. I would be very appreciate to have this feature!
Thanks!