Closed CanadianHusky closed 4 years ago
This could be related to the changed handling of the alpha channel in PNG images: the latest Tesseract code replaces the alpha channel by white.
@CanadianHusky, could you please try both versions with the same image in other formats (for example JPEG or TIFF) or with a PNG without alpha channel?
Hello,
@stweil I have tested RC3 and RC4 and the final version 4-20181030 builds. I used BMP and JPG input of the same image. All of them suffer from the same problem and fail to detect orientation correctly, that used to be working in RC1 The problem must have been introduced somewhere between the date ranges of RC1 and RC3 thank you
Hello, I see a new pre-compiled release at https://digi.bib.uni-mannheim.de/tesseract/ for
tesseract-ocr-w64-setup-v4.1.0.20190314.exe
and tested that release against the issue mentioned above.
The result on the input image is still incorrect. I am unsure if the binary release I have used is really a 4.1.0 release or if this an intermediary build.
thank you
That binary is based on latest Tesseract sources (Git master).
@CanadianHusky: you can copy and paste terminal output by mouse select (with left button, and if you then click with right in terminal you have selection in clipboard) - it is more useful than screenshots.
I made test with the latest code (5.0.0-alpha-50-g3f4dc) and best tessdata:
> tesseract i2062.png - --dpi 175 -c min_characters_to_try=10 --psm 0 -l eng
Warning, detects only orientation with -l eng
Warning. Invalid resolution 0 dpi. Using 70 instead.
Page number: 0
Orientation in degrees: 270
Rotate: 90
Orientation confidence: 14.00
Script: Latin
Script confidence: -nan(ind)
But if I skip language specification (eng should be used anyway) I got different result:
> tesseract i2062.png - --dpi 175 -c min_characters_to_try=10 --psm 0
Warning. Invalid resolution 0 dpi. Using 70 instead.
Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 0.28
Script: Greek
Script confidence: 4.36
Detection of orientation is correct, but script is wrong. This is quiet strange that specification of eng language is cause different result...
And using tessdata
(e.g. not fast, not best) provide correct result:
tesseract i2062.png - --psm 0 --tessdata-dir tessdata -c min_characters_to_try=10 -l eng
Warning, detects only orientation with -l eng
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 174
Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 0.54
Script: Latin
Script confidence: 33.33
Seems like LSTM model is not able to detect correctly orientation on this kind of images (Too few characters), but legacy is working fine:
pi@raspberrypi:/usr/src/test $ tesseract i2062.png - --psm 0 --tessdata-dir tessdata --oem 0 --dpi 175 -c min_characters_to_try=10 -l eng
Warning, detects only orientation with -l eng
Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 0.54
Script: Latin
Script confidence: 33.33
pi@raspberrypi:/usr/src/test $ tesseract i2062.png - --psm 0 --tessdata-dir tessdata --oem 1 --dpi 175 -c min_characters_to_try=10 -l eng
Warning, detects only orientation with -l eng
Page number: 0
Orientation in degrees: 270
Rotate: 90
Orientation confidence: 14.00
Script: Latin
Script confidence: nan
pi@raspberrypi:/usr/src/test $ tesseract i2062.png - --psm 0 --tessdata-dir tessdata --oem 2 --dpi 175 -c min_characters_to_try=10 -l eng
Warning, detects only orientation with -l eng
Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 0.54
Script: Latin
Script confidence: 33.33
pi@raspberrypi:/usr/src/test $ tesseract i2062.png - --psm 0 --tessdata-dir tessdata --oem 3 --dpi 175 -c min_characters_to_try=10 -l eng
Warning, detects only orientation with -l eng
Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 0.54
Script: Latin
Script confidence: 33.33
More details, that can bring some light how it works:
If there is not language specification - only osd.traineddata
is used (according strace report) That is reason why Script detection is not correct.
When there is specification of language -l eng
then:
eng.traineddata
is openedosd.traineddata
is opened... I am not sure if we can/want do something with this.
As soon as I see a stable binary release that I can test, I will try those suggested command line options. if using --oem option with the correct value is able to detect correct orientation and a reasonable confidence value, that is sufficient. It does not matter to me personally if the detection is done with LSTM or legacy code. Of course it is very desirable that this sort of orientation detection works as fast as possible. I appreciate the provided information. Thank you @zdenop
If my observation is correct you do not need to wait for stable release: just use tessdata repository for OSD.
@zdenop, it is normal that only osd.traineddata
is used if no explicit language was given. That file includes a selection of more than 1700 unicode characters from different scripts which are used to detect the right script. It is only available for the legacy OCR engine. Therefore it won't work if you use --oem 1
or compile Tesseract without that engine.
My tests with latest Tesseract code all give the right orientation as long as I do not add --oem 1
.
So what is the status of this issue? Can it be closed?
@CanadianHusky, do you still have that problem?
Orientation detection still has problems for me. Here are my test results, after having adjusted the command line as recommended by @stweil
Test environment : clean install from https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-v5.0.0.20190623.exe
all 3 input images are 0 degrees, but get detected with incorrected result. I admit that input 3 image is poor quality and a higher preprocessing resolution does find the correct result. However input 2 and 4 are as good as its going to get images with clean and large enough letters that I would have liked to see a correct result.
Am I still doing something wrong in the command line ?
input2 image :
input 3 image :
input 4 image :
also worth noting, adding -l eng (or -l deu) changes the orientation detection result, still to an incorrect result, but very high confidence.
It might be related to this OSD related issue.
Reading @zdenop and @stweil comment, it seems that there in no regression in newer versions with the first image in this issue.
Nobody commented about the other images. It is not clear if the OP claims that there is a regression here too, or just complains about the wrong result.
I tested the input2 image.
I got correct result with:
tesseract input2.png input2 --psm 0 -l eng --tessdata-dir $testadadir/tessdata -c min_characters_to_try=10
console:
Warning, detects only orientation with -l eng
Tesseract Open Source OCR Engine v5.0.0-alpha-580-g87841 with Leptonica
Warning: Invalid resolution 0 dpi. Using 70 instead.
Estimating resolution as 225
Warning. Invalid resolution 0 dpi. Using 70 instead.
input2.osd
Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 1.36
Script: Latin
Script confidence: 29.17
I'm not going to bother testing more images.
Thank you for revisiting this issue. In the meantime I have discovered the source of the inconsistency. The issue is not a regression in the code itself but depends in which TRAINEDDATA file is used. When I do a clean install from https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-v5.0.0.20190623.exe or any recent release...
This data file is installed
Now observe these tests, only -l eng
changes. Expected result is 0 degrees and meaningful confidence value
C:\Program Files\Tesseract-OCR>tesseract --version
tesseract v5.0.0-alpha.20191030
leptonica-1.78.0
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX
Found SSE
Found libarchive 3.3.2 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5
C:\Program Files\Tesseract-OCR>tesseract --tessdata-dir "C:\Program Files\Tesseract-OCR\tessdata" --psm 0 -l eng -c min_characters_to_try=10 "input2.png" stdout
Warning, detects only orientation with -l eng
Page number: 0
Orientation in degrees: 270
Rotate: 90
Orientation confidence: 50.00
Script: Latin
Script confidence: 2.00
WRONG
C:\Program Files\Tesseract-OCR>tesseract --tessdata-dir "C:\Program Files\Tesseract-OCR\tessdata" --psm 0 -l eng_15040 -c min_characters_to_try=10 "input2.png" stdout
Warning, detects only orientation with -l eng_15040
Page number: 0
Orientation in degrees: 270
Rotate: 90
Orientation confidence: 50.00
Script: Latin
Script confidence: 2.00
WRONG
C:\Program Files\Tesseract-OCR>tesseract --tessdata-dir "C:\Program Files\Tesseract-OCR\tessdata" --psm 0 -l eng_22917 -c min_characters_to_try=10 "input2.png" stdout
Warning, detects only orientation with -l eng_22917
Page number: 0
Orientation in degrees: 0
Rotate: 0
Orientation confidence: 1.38
Script: Latin
Script confidence: 30.00
CORRECT!
Here the trained data files
These are the files in tessdata and clearly the source of the issue for me is that the original file installed with the binary distribution does not give the expected result. File eng_22917 was downloaded seperately from the traineddata repository
I would be interested to know what size your eng.traineddata file is and where it is from.
The source for my trained data files are as follows:
https://github.com/tesseract-ocr/tessdata/blob/master/eng.traineddata 22917kb and the only file that works for orientation detection probably because it has the legacy models that OSD code needs
https://github.com/tesseract-ocr/tessdata_fast/blob/master/eng.traineddata 4017kb, also part of the binary installation, does not work with --psm 0 for orientation detection purposes for me
https://github.com/tesseract-ocr/tessdata_best/blob/master/eng.traineddata 15040kb, does not work with --psm 0 for orientation detection purposes for me
It took me very long time to understand and figure out this issue. I hope this information helps someone else. I have closed the issue.
I suppose the question now becomes if it makes sense to add a note to the binary distribution or elsewhere in the release notes from @stweil that the included default traineddata file is the fast integer model, which is totally fine for most users when all thay want to do is regular OCR. For anyone that is interested in OSD only like me, the traineddata files that I linked to must be used as far as I see from my tests. Thanks again for having this pinned and looked into. Much appreciated.
I would be interested to know what size your eng.traineddata file is and where it is from.
I used eng.traindata from the tessdata repo.
https://github.com/tesseract-ocr/tessdata/blob/d87b3cbc7555/eng.traineddata
Size: 24.5 MB (24,530,234 bytes
Environment
Binary release clean install from
https://github.com/UB-Mannheim/tesseract/wiki https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-w64-setup-v4.0.0.20181030.exe
Current Behavior:
orientation is detected wrong in supplied file with shown command line
WRONG Result :
Expected Behavior:
compare the same input against 4.0.0-rc1
CORRECT Result :
the orientation confidence value based on tests on thousdands of files in rc1 version is extremely accurate and makes sense. It is used as a threshold if the result can be trusted or not the result from 20181030 release is horribly mistaken
Input Image :
Suggested Fix:
invesigate what lead to regression in OSD code
thank you kindly