Closed pienkowskip closed 8 years ago
Want to have even more fun? Shuffle the paths.
From empirical evidence it's Diving Board
that breaks the subsequent extractions.
Also to "fix" the problem and degrade the performance into nothingness, you can create an engine for every OCR.
Ok, this seems funny & unfixable. Closing issue.
I suppose that this is more like tesseract engine (not this gem) issue but I hope you could help me to explain how this is happening. And it's funny. ;)
So the problem is when I reuse tesseract engine in script I get different text for this image (especially white part): The script attached walks through samples files and a few files are duplicated:
At the beginning for file
201500000000000.png
I getHpisafi: Collect Call
which is sth like proper solution. But later on for the same image and filename20151012T010407.png
I getHpisafi: Dniiect Dali
and the same for the last one20151012T011003.png
. If I reverse order (uncomment line 12) the things get more complicated because I getHpisafi: Collect Dali
. Isn't it funny?Of course if I create new instance of tesseract engine for each file I get proper results:
Hpisafi: Collect Call
. And if I switch to Polish results are the same for the same image but not so good :(So do you have any idea how this can happening? Can you reproduce that on your machine? I thought that OCR result should be "repeatable".
Attachments
Samples + script
https://drive.google.com/file/d/0BxJIH-bPcJwFZnJuaEZoaGEyVGc/view
Script
Results over files in alphabetical order
Results over files in reversed alphabetical order