tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
61.11k stars 9.39k forks source link

Text2Image isn't working properly #3817

Open Zacharymk1213 opened 2 years ago

Zacharymk1213 commented 2 years ago

I'm trying to retrain this Tesseract Engine (https://gitlab.com/pninim.org/tessdata_heb_rashi/-/blob/main/tesseract_4.1.1/TRAINING.md) for a specific obscure Hebrew Script for Tesseract 5. I'm trying to, using the command listed there, get a list of available fonts using text2image --list_available_fonts --fonts_dir FontsRashi/Working which initially worked but has ceased to do so.

Environment

Current Behavior: Displays (process:98484): Pango-CRITICAL **: 23:45:52.231: pango_font_description_set_size: assertion 'size >= 0' failed followed by what seems like a list of fonts installed on the system.

Expected Behavior: List the Fonts available in a directory

Suggested Fix: No idea. I need help troubleshooting this issue. Expected behavior was demonstrated until very recently despite the fact that I seem to be using the same install since I built from source (I don't remember the commit used)

Below are some photos relevant to the error. image image image image image

zdenop commented 2 years ago

I am afraid we can do nothing here: text2image uses pango for working with fonts, so if there (in pango) is problem/error it must be solved there. Anyway simple test case (font + short text for generating image) + info about libraries version could help with debuging...

stweil commented 2 years ago

Maybe related issue: https://github.com/amitdo/text2tif-2016/issues/5.

amitdo commented 2 years ago

text2image --list_available_fonts --fonts_dir FontsRashi/Working

This is wrong usage of the command.

It should be:

text2image --list_available_fonts --fonts_dir=/fullpath/to/FontsRashi/Working

amitdo commented 2 years ago

Maybe related issue: amitdo/text2tif-2016#5.

:-)

It seems that the list we produce may contain fonts that Pango can't render (not otf/ttf fonts).

Maybe there is a way to filter these fonts.

yarikoptic commented 1 year ago

I think I have ran into the same issue in the sense that the --fonts_dir is nohow used! I have tried with freshly built tess:

yoh@typhon:~/proj/repronim/tesseract-train$ text2image --version
Using CAIRO_FONT_TYPE_FT.
Pango version: 1.50.12
5.3.1-22-g24da4

to list fonts I have in a folder:

yoh@typhon:~/proj/repronim/tesseract-train$ strace -f -o /tmp/123 text2image --list_available_fonts --fonts_dir ../siemens-fonts/selected/ >/dev/null

(process:3654728): Pango-CRITICAL **: 11:04:36.387: pango_font_description_set_size: assertion 'size >= 0' failed

and it doesn't even look into that folder:

yoh@typhon:~/proj/repronim/tesseract-train$ grep siemens /tmp/123
3654728 execve("/home/yoh/proj/misc/tesseract/build/bin/text2image", ["text2image", "--list_available_fonts", "--fonts_dir", "../siemens-fonts/selected/"], 0x7ffc067c20d0 /* 27 vars */) = 0
amitdo commented 1 year ago

Try to use the full path to the font dir instead of a relative path.

yarikoptic commented 1 year ago
yoh@typhon:~/proj/repronim/tesseract-train/tesseract_tutorial$ rm /tmp/123-2; strace -s 1024 -f -o /tmp/123-2 text2image --list_available_fonts --fonts_dir $PWD/../../siemens-fonts/fonts/ | nl | tail

(process:4152228): Pango-CRITICAL **: 14:53:09.019: pango_font_description_set_size: assertion 'size >= 0' failed
   124  123: Symbola Semi-Condensed
   125  124: URW Bookman Light
   126  125: URW Bookman Light Italic
   127  126: URW Bookman Semi-Bold
   128  127: URW Bookman Semi-Bold Italic
   129  128: URW Gothic
   130  129: URW Gothic Oblique
   131  130: URW Gothic Semi-Bold
   132  131: URW Gothic Semi-Bold Oblique
   133  132: Z003 Medium Italic
yoh@typhon:~/proj/repronim/tesseract-train/tesseract_tutorial$ grep sieme /tmp/123-2
4152228 execve("/home/yoh/proj/misc/tesseract/build/bin/text2image", ["text2image", "--list_available_fonts", "--fonts_dir", "/home/yoh/proj/repronim/tesseract-train/tesseract_tutorial/../../siemens-fonts/fonts/"], 0x7fff0768a7c0 /* 27 vars */) = 0
4152228 write(3, "<?xml version=\"1.0\"?>\n<!DOCTYPE fontconfig SYSTEM \"fonts.dtd\">\n<fontconfig>\n<dir>/home/yoh/proj/repronim/tesseract-train/tesseract_tutorial/../../siemens-fonts/fonts/</dir>\n<cachedir></cachedir>\n<config></config>\n</fontconfig>\n", 227) = 227
yoh@typhon:~/proj/repronim/tesseract-train/tesseract_tutorial$ grep 'O_WRONLY.*= 3$'  /tmp/123-2
4152228 openat(AT_FDCWD, "fonts.conf", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 3

so only shows that it does create that fonts.conf locally with the path but otherwise that folder is not accessed.