tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
61.08k stars 9.39k forks source link

psm 6 producing gibberish in certain cases #1603

Open Shreeshrii opened 6 years ago

Shreeshrii commented 6 years ago

With this particular washed out image, while default --psm 3 and --psm 4 provide English output, using --psm 6 generates gibberish. This happens with all three flavors of traineddata. I am attaching a file with console output.

Here is a sample of the output for comparison:

psm 3

“BASIC SCONE RECIPE
1 Lb. Self-raising Flour 2 Ozs. Butter
1 Level Teaspoon Sait 15 Cups Milk

Prepare a hot oven (450°F.). Sift flour and salt. Using
finger tips rub butter into flour. Make well in centre and add
milk all at once, stirring in flour, quickly and lightly, to a soft
dough. Turn on to lightly floured board and knead just

psm 6

Ek | .
Ep pe = 2 =
2 fin, rep eel B = oe
2 ee are If raisi AS Ci Sos
+ ol = a
E oh, 1 Ee = es =
= Tor i utter = ON ow
= ovel ae a e, tori @ E =
BE Ne ey ing 50° RE od
av of ay to ke a o fl F. 2 Cc Ld
CT: t 5 T oe gL 0 p) 20 ! cs
= e i she . fous. . es PE -
a RI © ee a moa : if sD f
Lo ae on so es nn ut =
= TI ze “4 to S, Ln i ak fl ace ~~
- cao oo - he i: ay |
§ 0) OnE: g el 5 FE
2 i = - i he |
2 Tex Len bo oe to a tri “bow Ii en alt.
E be we ve d S or en 11 cu
oa Ser
2 dance cu ift sin, ae bates 1 1d bios ing
: eo er, 3 gar 2 + a : nute eg = da
: ax Lai bee a tab abi ce a soft
= © Si oe 1t fablsp Er
Ee tks
= = uti ab ut ak 00; 00; ni oil o
raion i he pi a ee a Te
= — he ol i a it Su iis §
Shreeshrii commented 6 years ago

Image used (from issue 1601)

https://user-images.githubusercontent.com/52270/40514191-2557fc94-5f5d-11e8-896d-ee3c52956076.png

Console output : https://github.com/tesseract-ocr/tesseract/files/2039107/1601.log.txt

Shreeshrii commented 5 years ago

In contrast, --psm 3 is producing gibberish in the example in https://github.com/tesseract-ocr/tesseract/issues/1327

Shreeshrii commented 5 years ago

Retested with current code. Problem still persists.

Looks like image pre-processing is being done for --psm 3. Can same (relating to whole image/page) be applied for other modes too?

Shreeshrii commented 5 years ago

Output for reference:


tesseract 4.1.0-rc1-250-g95a1
 leptonica-1.76.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.3.0

 *****  ./1603.png LANG eng TESSDATA tessdata OEM 1 PSM 3 ****
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
Sn

NS

LLLLLLLLLLELD DDD

+ IPILLLLLBLLLLLYL

"7" BASIC SCONE RECIPE
I Lb. Self-raising Flour 2 Ozs. Butter
1 Level Teaspoon Sait 13 Cups Miik

Prepare a hot oven (450°F.). Sift flour and salt. Using
finger tips rub butter into flour. Make well in centre and add
milk all at once, stirring in flour, quickly and lightly, to a soft
dough. Turn on to lightly floured board and knead just
enough to make a smooth surface. Roll to 4 inch thickness
and cut into rounds, squares or triangles. Place on floured
oven tray; glaze top with milk or beaten egg and milk. Bake -
in hot oven (450°F.) 10 to 15 minutes. Makes about 20
average size scones.

VARIATIONS OF PLAIN SCONE

Sweet Scones. Add 2 level tablespoons sugar and 1 beaten
egg to the above, using 1 tablespoon less milk.

Date Scones. Sift 2 level teaspoons cinnamon with the flour.
Add 1 oz. sugar and 4 oz. chopped dates before the milk.
Teaparty Scones. Cut scones with round cutter, 1 inch in
diameter, baking about 7 minutes.

Cocktail Scones. Cut scones with small round cutter. When
baked cut in half and use as base for savouries. These scones
are usually cheese flavoured or well seasoned with celery salt
and cayenne.

Raisin Scones. Sift 1 level teaspoon grated nutmeg with flour
and add 4 oz. chopped seeded raisins.

Wholemeal Honey Scones. Use half white and half wholemeal
self-raising flour. Add 1 egg and 2 tablespoons honey in
mixing. The grated rind of an orange may be added.

Cheese Scones. Sift level teaspoon cayenne with the flour
and add 3 cup grated cheese before adding milk. An egg
may be added. ’
Cheese and Onion Scones. Add 1 level dessertspoon finely
chopped onion to the cheese scone mixture.

Celery Scones. Add 1 level teaspoon celery salt, omitting
plain salt, 1 small chopped onion.

127%

real    0m25.892s
user    0m59.548s
sys     0m0.436s

 *****  ./1603.png LANG eng TESSDATA tessdata OEM 1 PSM 6 ****
ig
> 5 a
1 se
PE bo oo
= 4 :
Ek | .
Ep pe = 2 =
2 fin, rep eel B = oe
2 ee are If raisi AS Ci Sos
+ ol = a
E oh, 1 Ee = es =
= Tor i utter = ON ow
= ovel ae a e, tori @ E =
BE Ne ey ing 50° RE od
av of ay to ke a o fl F. 2 Cc Ld
CT: t 5 T oe gL 0 p) 20 ! cs
= e i she . fous. . es PE -
a RI © ee a moa : if sD f
Lo ae on so es nn ut =
= TI ze “4 to S, Ln i ak fl ace ~~
- cao oo - he i: ay |
§ 0) OnE: g el 5 FE
2 i = - i he |
2 Tex Len bo oe to a tri “bow Ii en alt.
E be we ve d S or en 11 cu
oa Ser
2 dance cu ift sin, ae bates 1 1d bios ing
: eo er, 3 gar 2 + a : nute eg = da
: ax Lai bee a tab abi ce a soft
= © Si oe 1t fablsp Er
Ee tks
= = uti ab ut ak 00; 00; ni oil o
raion i he pi a ee a Te
= — he ol i a it Su iis §
> wi res © ees: and sc mi 5 Pp cin; a ut
a sr e fla ed amon ad 20 -
) selfing oz. Sift vous as or des = i
5 Ren a 1 red a . nd Bet ith eal
2 : es : Ti he op lev: on be az Og th ten
hey Go «Kora tte =
f= for savon w tho m
= Savon ® th mi
my i ne. at ot a Souris c milk.
= cl eel — : ed a d 00; SO! rie ic inc] ilk
=
E Ce ph ed. ale oe min Ths n
— lei d d Tal £ o all 5 ed Se Ww
E plai ry a Es bo o pe Gs
a ns Ser on ion = fe or: gd hit utm lis ge
i a t S Ee € e Ty es
5 es. © c = 3 ge a g sal :
1 5 the on © pO on bl d wi It
a a oy hall ith fl oR
d = hd a fw 0
ad 11 d Li ur
oe . E i adds
pe AC con 1 din 5 ed on a
d 1 e I wi e
2 0! te eve g on ey al
ni ea mi; el mi i
io Sp Xt d ilk th z
n 00! ur es! - e fl
12 0 e. a Ey :
7 ‘ cel tsp. n ur
= oon .
al :
t n
, omif Jy
thin:
2 :

real    0m38.275s
user    1m20.152s
sys     0m0.668s

 *****  ./1603.png LANG eng TESSDATA tessdata_best OEM 1 PSM 3 ****
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
gee

So

LLLLLLLLLLLLD DDD

+ IIILLLLLBLLLLL DL

~~ BASIC SCONE RECIPE
I Lb. Self-raising Flour 2 Ozs. Butter
1 Level Teaspoon Sait 1% Cups Milk

Prepare a hot oven (450°F.). Sift flour and salt. Using
finger tips rub butter into flour. Make well in centre and add
milk all at once, stirring in flour, quickly and lightly, to a soft
dough. Turn on to lightly floured board and knead just
enough to make a smooth surface. Roll to 4 inch thickness
and cut into rounds, squares or triangles. Place on floured
oven tray; glaze top with milk or beaten egg and milk. Bake -
in hot oven (450°F.) 10 to 15 minutes. Makes about 20
average size scones.

VARIATIONS OF PLAIN SCONE

Sweet Scones. Add 2 level tablespoons sugar and 1 beaten
egg to the above, using 1 tablespoon less milk.

Date Scones. Sift 2 level teaspoons cinnamon with the flour.
Add 1 oz. sugar and 4 oz. chopped dates before the milk.
Teaparty Scones. Cut scones with round cutter, 1 inch in
diameter, baking about 7 minutes.

Cocktail Scones. Cut scones with small round cutter. When
baked cut in half and use as base for savouries. These scones
are usually cheese flavoured or well seasoned with celery salt
and cayenne.

Raisin Scones. Sift 1 level teaspoon grated nutmeg with flour
and add 4 oz. chopped seeded raisins.

Wholemeal Honey Scones. Use half white and half wholemeal
self-raising flour. Add 1 egg and 2 tablespoons honey in
mixing. The grated rind of an orange may be added.

Cheese Scones. Sift level teaspoon cayenne with the flour
and add 3 cup grated cheese before adding milk. An egg
may be added. .
Cheese and Onion Scones. Add 1 level dessertspoon finely
chopped onion to the cheese scone mixture. :

Celery Scones. Add 1 level teaspoon celery salt, omitting
plain salt, 1 small chopped onion.

27

real    0m30.765s
user    1m8.656s
sys     0m0.532s

 *****  ./1603.png LANG eng TESSDATA tessdata_best OEM 1 PSM 6 ****
=
2 2 i
= sr
E 2 0 oF
= 4 :
g | 3
Eg we Le z = a
2 fin, rep - B a oe
2 are If-raisi AS IC a
+ oh a %
Ee oh, 1 or es =
= a Tot i utter 2 ON a
2 ovel on e, tori @ E en
= Tn = ing 50° RE i
av of ay to ke ne o fl F. 2 Cc Ld
CT: t 5 T Tu g 1 0 p) 22 I po”
= ee Ch A I tos. St io PE a
a RI 2 oe a moo : if ks f
ee 2 _ oa mr nl ut Te
= ee TI ze @ to S, a y ak fl wa c
2 cao i — i 2 a
i 0) one E el : E
2 i He % i FO |
2 Te oo bo eh to tri bau Ii en alt.
Ee a re ve d S or en 11 Ph
EE go
2 dance - ift sin, beaten 1 1d bios ing
: Tig er, oo gar 2 or ny : nute cg Toes $20
ax pi a Sa tab abi a ce id soft
= 2 A he 1t fabisp Sa
Ee ticks
2 2 ti ab ut _ 00; 00; ni oil o
raion fy ey tot on Sy Ee
=? i hs a oe EL 7 = Baie g
> Wi ne © ees and sc mi 5 Pp cin; ut .
> gun e fla ed amor ead 20 -
) selfing oz. Sift vous as a dae = rh
25 oo 1 red hi : nd bef ith eal
2 ” cd : ES er PD lev: oF fo en ore th i
Coy «ira tte 5
ah ah for sao w tho m
= Sevan : th mi
yb ne. at Hi — Souris, c milk.
= cl eel eg > ed ee d 00; SO! rie c inc] ilk
=
ee Cei i ed. ee min . Ths n
—- lei d d Ta £ o all 5 ed — w
Ey plai ry lll i 5 ike " oy he
a ns Se on ion fe or: d hit utm cle i
i a t S Si € e Ty es
5 es. 0 c = S ge a g sal :
1 5 the on © pO m bl d wi It
Shi ee ey hall ith fl =
d 2 A fr fw 0
aa 11 d Li ur
op, ee d ee adds on
pe AC COIN 1 din L ed on is
d 1 e I wi e
2 0! te eve g ih ey al
ni ea mi; el mi i
io Sp Xt d ilk th n
A 00! ur es! - e fl
12 0 e. on :
7 ‘ cel tsp. n ur
ory oon =
sal :
t n
, omii ely
thin:
2 ’

real    0m41.097s
user    1m26.448s
sys     0m0.724s

 *****  ./1603.png LANG eng TESSDATA tessdata_fast OEM 1 PSM 3 ****
Error in boxClipToRectangle: box outside rectangle
Error in pixScanForForeground: invalid box
c

Se

DOEEEEEEEKE SEH DDD

sIDSELEEEEEOELEOD

“BASIC SCONE RECIPE
1 Lb. Self-raising Flour 2 Ozs. Butter
1 Level Teaspoon Sait 15 Cups Milk

Prepare a hot oven (450°F.). Sift flour and salt. Using
finger tips rub butter into flour. Make well in centre and add
milk all at once, stirring in flour, quickly and lightly, to a soft
dough. Turn on to lightly floured board and knead just
enough to make a smooth surface. Roll to + inch thickness
and cut into rounds, squares or triangles. Place on floured
oven tray; glaze top with milk or beaten egg and milk. Bake -
in hot oven (450°F.) 10 to 15 minutes. Makes about 20
average size scones.

VARIATIONS OF PLAIN SCONE

Sweet Scones. Add 2 level tablespoons sugar and 1 beaten
egg to the above, using | tablespoon less milk.

Date Scones. Sift 2 level teaspoons cinnamon with the flour.
Add 1 oz. sugar and 4 oz. chopped dates before the milk.
Teaparty Scones. Cut scones with round cutter, 1 inch in
diameter, baking about 7 minutes.

Cocktail Scones. Cut scones with small round cutter. When
baked cut in half and use as base for savouries. These scones
are usually cheese flavoured or well seasoned with celery salt
and cayenne.

Raisin Scones. Sift 1 level teaspoon grated nutmeg with flour
and add 4 oz. chopped seeded raisins.

Wholemeal Honey Scones. Use half white and half wholemeal
self-raising flour. Add 1 egg and 2 tablespoons honey in
mixing. The grated rind of an orange may be added.

Cheese Scones. Sift 4 level teaspoon cayenne with the flour
and add % cup grated cheese before adding milk. An egg
may be added. r
Cheese and Onion Scones. Add 1 level dessertspoon finely
chopped onion to the cheese scone mixture.

Celery Scones. Add 1 level teaspoon celery salt, omitting
plain salt, 1 small chopped onion.

27,

real    0m16.325s
user    0m34.592s
sys     0m0.344s

 *****  ./1603.png LANG eng TESSDATA tessdata_fast OEM 1 PSM 6 ****
Sg
TA y gs
B% Ae
e Zs op oe
= | 4
- ao
. Di ts z ne ee
2 fin, rep ee B SS —
2 aie are rast AS Ce CG
a a oer “Gi
; uh, 7 oe es ie
= ee Tut i utter _ ON
2 ovel oe a e, er @ E co
5 ae ts , iting 50° RE oe
ave 01 ay to ike. a 0 fi F. 2 Cc cs
er t 3 Tr aS gl O ) Le I ge
= oe e ond i lous: oe Gis PE ug
2 R a er a snoa : if 8 we
ae doe ae fe eee Spas ue ae
= ae TI ze 4 to: 8, a Me ak fi na ae
a cg S ie ie i |
: 01 one: : et : ro
Mt os - ace on ,
2 Te wae bo’ ae to a tri ‘beard li en alt.
e ae ee ve. id S or eat tI ca
eee eee
2 same sift sin, ee besten 4 ad yto8 ing
Z es er, a gar 2 ae oe mute eg |
: an ae ae ae tab ab oe ce a soft
2 c oe Ae Lt tablesp on
as thks
2 ut i al ab ut ae 00} 001 i e:
' Rain five ‘al ae oo wen aoa
= ae ao oe Pa a Su kale :
> Wi ee le ees and sc mi s PP cini ae nut é
2 oe e fla ee amon cad 20 -
) sli oz. sift vou! as a dates = is
2 i es 1 red oe A nd en ith eat
2 s So : Ti ae Pe lev oe oe oo Ce th i
ey aot efor tte oe
== ey Tor savou them
Sa sour © the mi
ay ne. at Iie ae vou, e milk.
a d ee: a 2 ed Hee d ole) SO) rie ie incl ilk
CS
E Cel ae led. oe ae shi . hes in
= le; d id Ta! f o ali . ed te W.
= plai ry ee oe eS ee - oe He
> Wh Se ion ion ae te or: g hit utm oe ee
. me t Ss € e Ty es
> es. 2 c — S ge ene 8 sal :
] 5 the on e po. a bl d wii It
aver ese ae halt ith fi =
d oe ae me fw 0
4 1 I d ae ur
oe oe a ae ade pe
pe Ve! com 1 din . led. on am
d 1 e I wi e
. 0 te eve & oe ey al
ni ea: mi: el mi i
10 sp xt de ilk th -
ms 00. ur es: : e fl
2 n e. ek Ke 5
Tr ‘ cel tsp n ur
oe oon a
eal :
It in
, omit By
‘tin:
8 :

real    0m26.503s
user    0m51.600s
sys     0m0.432s
Shreeshrii commented 3 years ago

Bug still exists:

tesseract 5.0.0-alpha-839-gd93e leptonica-1.75.3 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0

Found NEON Found OpenMP 201511

jbarth-ubhd commented 3 years ago

I'll find this glibberish somewhat more aesthetical than obscurity I've seen, probably because glibberish has much less non-letter characters.

Can we be sure that this output is unintentional, when the neural net is considered a "black box"? (joke)