tesseract-ocr / tessdoc

Tesseract documentation
https://tesseract-ocr.github.io/tessdoc/
1.85k stars 364 forks source link

Missing example images in ImproveQuality #18

Closed zacharysyoung closed 4 years ago

zacharysyoung commented 4 years ago

The Dilation and Erosion section seems very self explanatory, I believe I understand the principle and its practice. It also lists a before-and-after image set that is not in the repo.

Can these images be added to the repo and properly integrated into the documentation? I think they'd just be more helpful.

stweil commented 4 years ago

The images are missing here: https://github.com/tesseract-ocr/tessdoc/blob/master/ImproveQuality.md#dilation-and-erosion.

@thadguidry, you added that documentation to the Wiki. Can you provide those two missing images?

thadguidry commented 4 years ago

@stweil Done! (converted them to PNG also for consistency with others)

thadguidry commented 4 years ago

@zacharysyoung Glad you thought that section I added a few months ago is useful to you! I spent a whole weekend on researching it :-)

zacharysyoung commented 4 years ago

@thadguidry, yeah, well, I thought I understood the effects of applying erosion, but running Tesseract over your sample images I can see it's not so cut-and-dry: it's a great example of how sensitive Tesseract is.

Thank you for your efforts :)

thadguidry commented 4 years ago

@zacharysyoung Oh there is so much "cutting" and "bleeding" when working with older historical texts :-) Btw, you can certainly dig in and retrain Tesseract to your needs, but that's where even I leave it to the experts on the mailing list to help.