tberg12 / ocular

Ocular is a state-of-the-art historical OCR system.
GNU General Public License v3.0
250 stars 48 forks source link

Requirements / Performance #8

Closed scmmmh closed 6 years ago

scmmmh commented 6 years ago

Hi,

For training the font model, what requirements are there on the system used to do the training and the images themselves? What resolution is recommended? Any pre-processing required?

The reason I ask is that I have been trying to learn a model for 19th century English-language newspapers and it runs and is processing, but after more than 48h it was still on the first iteration of the first image "Extracting text line images".

I can provide the image if that is any help.

Best Regards, Mark

LeviDobbins commented 6 years ago

Hi Mark,

When I ran the code I did not need to perform any pre-processing of the image, only darkening (increasing the contrast) of the image to improve the quality of the output text. Even then I did not need to darken the image for it to work the first time.

When I run the code, it takes my computer (macbook pro) approximately 5-10 minutes per training iteration. Running the code should definitely not take 48 hours as you described.

If you can provide your code and the image you used I can run the code on my machine and let you know how it goes, and I might be able to help you out. But I suspect there may be an issue with the code or file path to your files.

Regards, Levi

scmmmh commented 6 years ago

Hi Levi,

Thank you for the fast response. I thought it should not take quite that long. Here are the texts that I trained the original language-model on, the sample (and some test) images (I only trained on the sample images), and the code used to run the stand-alone JAR (which is just the code from the docs with filenames changed, as far as I can see).

texts.zip sample_images.zip pipeline.zip

Best Regards, Mark H

LeviDobbins commented 6 years ago

Hi Mark,

I can see a couple of issues with your code that I myself worked through when I was trying to get the code to run. See the following tips for getting your code to work:

  1. If you use image files (such as png or jpg) the code will still run but the transcription will be blank. So you need to covert your images to pdf form and use a pdf file instead of the image file.
  2. Instead of typing the file path to your files, what I do is drag and drop the file into the terminal window and the computer will automatically paste the full file path name into the terminal window (in place of where you would normally type the file path).
  3. The code will not work if the pdf is less than 15 or so lines. (I'm not sure why this is but you need to use pdfs with more than 15 lines in order for the code to work properly)

See an example of the working code I used to generate a fairly accurate output transcription from an input pdf, after only 3 iterations. Hopefully you can use this to format your code properly and get proper results.

https://github.com/tberg12/ocular/files/1694757/Working.Ocular.Code.zip

Regards, Levi

scmmmh commented 6 years ago

Hi Levi,

Thank you very much for your time and in particular the detailed tips on how to get it working.

Thank you, Mark

LeviDobbins commented 6 years ago

No problems Mark, glad I could help.

Regards, Levi