Open avpicov opened 5 years ago
You're correct that this has been dormant for a while. Part of this is due to the work I put into getting a version of the Latin-specific OCR training into Tesseract core:
But since that uses the one-size-fits-all Tesseract training process, some things were lost from this Latin-specific process. So I'll try to take a look at fixing this build.
First, thank you for making this code available. I am having problems with the docker build. I'm thinking it may not have been used in a while. My first change was updating the base image from from ubuntu:wily to ubunutu.xenial.
#FROM ubuntu:wily FROM ubuntu:xenial
I then made the following change:`
RUN locale-gen en_US.UTF-8
RUN apt-get clean && apt-get update && apt-get install -y locales && locale-gen en_US.UTF-8 `
The latest issue I have run into is the following:
`Step 22/28 : RUN wget -O tesseract-3.04.00/training/tesstrain_utils.sh 'https://raw.githubusercontent.com/tesseract-ocr/tesseract/master/training/tesstrain_utils.sh' ---> Running in 6368cfd8f443 --2019-04-19 19:57:08-- https://raw.githubusercontent.com/tesseract-ocr/tesseract/master/training/tesstrain_utils.sh Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2019-04-19 19:57:08 ERROR 404: Not Found.
The command '/bin/sh -c wget -O tesseract-3.04.00/training/tesstrain_utils.sh 'https://raw.githubusercontent.com/tesseract-ocr/tesseract/master/training/tesstrain_utils.sh'' returned a non-zero code: 8 ` I'm not sure how active the project is at this point, but I wanted to reach out to see if you might know what the issue is here.
Thanks -Will