Closed scmmmh closed 6 years ago
Hi Mark,
When I ran the code I did not need to perform any pre-processing of the image, only darkening (increasing the contrast) of the image to improve the quality of the output text. Even then I did not need to darken the image for it to work the first time.
When I run the code, it takes my computer (macbook pro) approximately 5-10 minutes per training iteration. Running the code should definitely not take 48 hours as you described.
If you can provide your code and the image you used I can run the code on my machine and let you know how it goes, and I might be able to help you out. But I suspect there may be an issue with the code or file path to your files.
Regards, Levi
Hi Levi,
Thank you for the fast response. I thought it should not take quite that long. Here are the texts that I trained the original language-model on, the sample (and some test) images (I only trained on the sample images), and the code used to run the stand-alone JAR (which is just the code from the docs with filenames changed, as far as I can see).
texts.zip sample_images.zip pipeline.zip
Best Regards, Mark H
Hi Mark,
I can see a couple of issues with your code that I myself worked through when I was trying to get the code to run. See the following tips for getting your code to work:
See an example of the working code I used to generate a fairly accurate output transcription from an input pdf, after only 3 iterations. Hopefully you can use this to format your code properly and get proper results.
https://github.com/tberg12/ocular/files/1694757/Working.Ocular.Code.zip
Regards, Levi
Hi Levi,
Thank you very much for your time and in particular the detailed tips on how to get it working.
Thank you, Mark
No problems Mark, glad I could help.
Regards, Levi
Hi,
For training the font model, what requirements are there on the system used to do the training and the images themselves? What resolution is recommended? Any pre-processing required?
The reason I ask is that I have been trying to learn a model for 19th century English-language newspapers and it runs and is processing, but after more than 48h it was still on the first iteration of the first image "Extracting text line images".
I can provide the image if that is any help.
Best Regards, Mark