Open minyk opened 7 years ago
Hello,
I first thought about using it. However if you look at the package details: https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr/+packages
You can see that the unoficial package is not building on both version (xenial trusty and so on). In addition I was not able to make it works with ppa:alex-p/tesseract-ocr.
Did you make a try?
I am following it closely and if you have a proposition it is welcome :) because it will reduce the build signicantly.
Hi, @speedfl
Actually I built my own tesseract4 images at March with this configuration: https://github.com/minyk/open-ocr/blob/feature/tesseract4.00alpha/docker-compose/open-ocr/Dockerfile Any problem did not occur during docker build
at that time.
I rebuild image today and tesseract4
is installed with tesseract - 4.00~git1851-10e04ff-1ppa1~xenial1
. Maybe the current published version's build was broken so apt-get
install older one.
Ok thx for your help. I will make a try this evening with your dockerfile and if it is working I will create a PR
Hello @minyk
It seems to work.
So you can proceed to a PR :)
A recap of the Dockerfile
FROM ubuntu
ENV GOPATH /opt/go
# Get git golang and gcc packages
RUN apt-get update && apt-get install -y \
software-properties-common \
git \
golang \
gcc
RUN add-apt-repository ppa:alex-p/tesseract-ocr && apt-get update
# Get tesseract-ocr packages
RUN apt-get install -y \
libleptonica-dev \
libtesseract4 \
libtesseract-dev \
tesseract-ocr
# Get language data.
RUN apt-get install -y \
tesseract-ocr-ara \
tesseract-ocr-bel \
tesseract-ocr-ben \
tesseract-ocr-bul \
tesseract-ocr-ces \
tesseract-ocr-dan \
tesseract-ocr-deu \
tesseract-ocr-ell \
tesseract-ocr-fin \
tesseract-ocr-fra \
tesseract-ocr-heb \
tesseract-ocr-hin \
tesseract-ocr-ind \
tesseract-ocr-isl \
tesseract-ocr-ita \
tesseract-ocr-jpn \
tesseract-ocr-kor \
tesseract-ocr-nld \
tesseract-ocr-nor \
tesseract-ocr-pol \
tesseract-ocr-por \
tesseract-ocr-ron \
tesseract-ocr-rus \
tesseract-ocr-spa \
tesseract-ocr-swe \
tesseract-ocr-tha \
tesseract-ocr-tur \
tesseract-ocr-ukr \
tesseract-ocr-vie \
tesseract-ocr-chi-sim \
tesseract-ocr-chi-tra \
tesseract-ocr-eng
RUN mkdir -p $GOPATH
# go get open-ocr
RUN go get -u -v -t github.com/tleyden/open-ocr
# build open-ocr-httpd binary and copy it to /usr/bin
RUN cd $GOPATH/src/github.com/tleyden/open-ocr/cli-httpd && go build -v -o open-ocr-httpd && cp open-ocr-httpd /usr/bin
# build open-ocr-worker binary and copy it to /usr/bin
RUN cd $GOPATH/src/github.com/tleyden/open-ocr/cli-worker && go build -v -o open-ocr-worker && cp open-ocr-worker /usr/bin
Hi, @tleyden @speedfl
Thanks for Tesseract4 version of Open-OCR! I have some opinion for Tesseract4 Dockerfile. As mentioned in this issue title, how do you think to use an unofficial PPA for Tesseract4? The PPA is https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr and enable PPA and just install like Tesseract3:
In this way, we don't install
dev
packages on Docker image.Thanks.