Closed Terry-Kusunoki-Martin closed 5 years ago
Yep, you're right. This project just doesn't solve the problems you're describing of slowness and quality issues with a large amount of input text.
You may want to look at Renard Wellnitz' open source Text Fairy project that takes a number of steps to solve these problems.
Hi, I've been taking a foray into ocr through this app, and I've noticed that while the accuracy and speed are very good on smaller bodies of text (a paragraph with around 4 sentences takes a few seconds to process), both metrics are significantly worse with larger bodies of text. The document below took a full 2:34 to process, and the text mostly came out garbled. I'm not sure if this is a proper issue, but I was wondering what other steps I can take to improve the speed/quality of OCR on larger amounts of text. Are there extra preprocessing steps I can take aside from thresholding? Are there parameters that are helpful to set when handling larger documents? Any advice would be much appreciated. Thank you very much.
Here is the original image:
Here is the resultant OCR text:
Here is the marked up bitmap from the OCRResult: