ryanfb / latinocr-lat

'lat' repository, forked from https://github.com/ryanfb/ancientgreekocr-grc. The final training process for lat.traineddata
https://ryanfb.github.io/latinocr/
Apache License 2.0
13 stars 3 forks source link

Docker build #7

Open avpicov opened 5 years ago

avpicov commented 5 years ago

First, thank you for making this code available. I am having problems with the docker build. I'm thinking it may not have been used in a while. My first change was updating the base image from from ubuntu:wily to ubunutu.xenial.

#FROM ubuntu:wily FROM ubuntu:xenial I then made the following change:

`

RUN locale-gen en_US.UTF-8

RUN apt-get clean && apt-get update && apt-get install -y locales && locale-gen en_US.UTF-8 `

The latest issue I have run into is the following:

`Step 22/28 : RUN wget -O tesseract-3.04.00/training/tesstrain_utils.sh 'https://raw.githubusercontent.com/tesseract-ocr/tesseract/master/training/tesstrain_utils.sh' ---> Running in 6368cfd8f443 --2019-04-19 19:57:08-- https://raw.githubusercontent.com/tesseract-ocr/tesseract/master/training/tesstrain_utils.sh Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected. HTTP request sent, awaiting response... 404 Not Found 2019-04-19 19:57:08 ERROR 404: Not Found.

The command '/bin/sh -c wget -O tesseract-3.04.00/training/tesstrain_utils.sh 'https://raw.githubusercontent.com/tesseract-ocr/tesseract/master/training/tesstrain_utils.sh'' returned a non-zero code: 8 ` I'm not sure how active the project is at this point, but I wanted to reach out to see if you might know what the issue is here.

Thanks -Will

ryanfb commented 5 years ago

You're correct that this has been dormant for a while. Part of this is due to the work I put into getting a version of the Latin-specific OCR training into Tesseract core:

But since that uses the one-size-fits-all Tesseract training process, some things were lost from this Latin-specific process. So I'll try to take a look at fixing this build.