tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
61.83k stars 9.47k forks source link

Non-existent flag --max_pages=3 in tesstrain.sh Phase I #1148

Closed xanxus10th closed 7 years ago

xanxus10th commented 7 years ago

Environment

I try to do tesstrain.sh by using this code.

training/tesstrain.sh --fonts_dir /mnt/c/Windows/Fonts --lang tha --training_text /mnt/e/tesseract-ocr/langdata/tha/tha.training_text --linedata_only \
  --noextract_font_properties --langdata_dir /mnt/e/tesseract-ocr/langdata \
  --tessdata_dir /mnt/e/tesseract-ocr/tessdata \
  --fontlist "Tahoma" --output_dir /mnt/e/OCR/

But I got this error

=== Starting training for language 'tha'
[Thu Sep 21 18:23:13 DST 2017] /usr/bin/text2image --fonts_dir=/mnt/c/Windows/Fonts --font=Tahoma --outputbase=/tmp/font_tmp.bUhGLbvCi0/sample_text.txt --text=/tmp/font_tmp.bUhGLbvCi0/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.bUhGLbvCi0
Rendered page 0 to file /tmp/font_tmp.bUhGLbvCi0/sample_text.txt.tif
Rtl = 0 ,vertical=0

=== Phase I: Generating training images ===
Rendering using Tahoma
[Thu Sep 21 18:23:16 DST 2017] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.bUhGLbvCi0 --fonts_dir=/mnt/c/Windows/Fonts --strip_unrenderable_words --leading=48 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.vdOu9qzJMf/tha/tha.Tahoma.exp0 --max_pages=3 --font=Tahoma --text=/mnt/e/tesseract-ocr/langdata/tha/tha.training_text
ERROR: Non-existent flag --max_pages=3
ERROR: /tmp/tmp.vdOu9qzJMf/tha/tha.Tahoma.exp0.box does not exist or is not readable
ERROR: /tmp/tmp.vdOu9qzJMf/tha/tha.Tahoma.exp0.box does not exist or is not readable

Please suggest me to fix this error. I try to fix this too many hours.

ivanzz1001 commented 7 years ago

do you have installed font "Tohoma" in your linux? I think you just mount the C:\Windows\Fonts to the /mnt directory。

amitdo commented 7 years ago

ERROR: Non-existent flag --max_pages=3

It seems that you are using a non recent commit, that came before https://github.com/tesseract-ocr/tesseract/commit/2633fef0b6ac

ivanzz1001 commented 7 years ago

@amitdo the "--max_pages=3" flag is in the script tesstrain_utils.sh which tesstrain.sh calls

xanxus10th commented 7 years ago

@ivanzz1001 @amitdo /mnt/c is use for access to the Windows directory. but I try to move "Fonts" Folder to the /usr/share/fonts/ and change code to --fonts_dir /usr/share/fonts/ but it appear the same error

=== Starting training for language 'tha'
[Fri Sep 22 03:04:10 DST 2017] /usr/bin/text2image --fonts_dir=/usr/share/fonts/ --font=Tahoma --outputbase=/tmp/font_tmp.r6wpt8kkkw/sample_text.txt --text=/tmp/font_tmp.r6wpt8kkkw/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.r6wpt8kkkw
FcInitiReinitialize failed!!
Could not find font named Tahoma. Pango suggested font
Please correct --font arg.:Error:Assert failed:in file text2image.cpp, line 437

=== Phase I: Generating training images ===
Rendering using Tahoma
[Fri Sep 22 03:04:12 DST 2017] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.r6wpt8kkkw --fonts_dir=/mnt/Fonts --strip_unrenderable_words --leading=48 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.4iPf952XEc/tha/tha.Tahoma.exp0 --max_pages=3 --font=Tahoma --text=/mnt/e/tesseract-ocr/langdata/tha/tha.training_text
ERROR: Non-existent flag --max_pages=3
ERROR: /tmp/tmp.4iPf952XEc/tha/tha.Tahoma.exp0.box does not exist or is not readable
ERROR: /tmp/tmp.4iPf952XEc/tha/tha.Tahoma.exp0.box does not exist or is not readable

And I try change the code in tesstrain_utils.sh Line 215 - common_args+=" --outputbase=${outbase} --max_pages=3" to Line 215 + common_args+=" --outputbase=${outbase} "

But It stuck at Phase Up for many hours

=== Phase I: Generating training images ===
Rendering using Tahoma
[Fri Sep 22 03:20:05 DST 2017] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.w5EOd46HIj --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --leading=48 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.QjgaXWkS0p/tha/tha.Tahoma.exp0 --font=Tahoma --text=/mnt/e/tesseract-ocr/langdata/tha/tha.training_text
Rendered page 0 to file /tmp/tmp.QjgaXWkS0p/tha/tha.Tahoma.exp0.tif
Rtl = 0 ,vertical=0

=== Phase UP: Generating unicharset and unichar properties files ===
[Fri Sep 22 03:20:06 DST 2017] /usr/bin/unicharset_extractor --output_unicharset /tmp/tmp.QjgaXWkS0p/tha/tha.unicharset --norm_mode 2 /tmp/tmp.QjgaXWkS0p/tha/tha.Tahoma.exp0.box
Shreeshrii commented 7 years ago

Please try with eng language and a font that you know is there on your system.

Currently Tahoma font is not being found. Try 'Arial' with English.

Then test for 'tha' with a font which supports the language unicode range.

Once the following errors are fixed, then others may also disappear.

=== Starting training for language 'tha' [Fri Sep 22 03:04:10 DST 2017] /usr/bin/text2image --fonts_dir=/usr/share/fonts/ --font=Tahoma --outputbase=/tmp/font_tmp.r6wpt8kkkw/sample_text.txt --text=/tmp/font_tmp.r6wpt8kkkw/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.r6wpt8kkkw FcInitiReinitialize failed!! Could not find font named Tahoma. Pango suggested font Please correct --font arg.:Error:Assert failed:in file text2image.cpp, line 437

ShreeDevi


भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Fri, Sep 22, 2017 at 8:51 AM, xanxus10th notifications@github.com wrote:

@ivanzz1001 https://github.com/ivanzz1001 @amitdo https://github.com/amitdo /mnt/c is use for access to the Windows directory. but I try to move "Fonts" Folder to the /mnt/ and change code to --fonts_dir /usr/share/fonts/ but it appear the same error

=== Starting training for language 'tha' [Fri Sep 22 03:04:10 DST 2017] /usr/bin/text2image --fonts_dir=/usr/share/fonts/ --font=Tahoma --outputbase=/tmp/font_tmp.r6wpt8kkkw/sample_text.txt --text=/tmp/font_tmp.r6wpt8kkkw/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.r6wpt8kkkw FcInitiReinitialize failed!! Could not find font named Tahoma. Pango suggested font Please correct --font arg.:Error:Assert failed:in file text2image.cpp, line 437

=== Phase I: Generating training images === Rendering using Tahoma [Fri Sep 22 03:04:12 DST 2017] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.r6wpt8kkkw --fonts_dir=/mnt/Fonts --strip_unrenderable_words --leading=48 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.4iPf952XEc/tha/tha.Tahoma.exp0 --max_pages=3 --font=Tahoma --text=/mnt/e/tesseract-ocr/langdata/tha/tha.training_text ERROR: Non-existent flag --max_pages=3 ERROR: /tmp/tmp.4iPf952XEc/tha/tha.Tahoma.exp0.box does not exist or is not readable ERROR: /tmp/tmp.4iPf952XEc/tha/tha.Tahoma.exp0.box does not exist or is not readable

And I try change the code in tesstrain_utils.sh Line 215 - common_args+=" --outputbase=${outbase} --max_pages=3" to Line 215 + common_args+=" --outputbase=${outbase} "

But It stuck at Phase Up for many hours

=== Phase I: Generating training images === Rendering using Tahoma [Fri Sep 22 03:20:05 DST 2017] /usr/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.w5EOd46HIj --fonts_dir=/usr/share/fonts/ --strip_unrenderable_words --leading=48 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.QjgaXWkS0p/tha/tha.Tahoma.exp0 --font=Tahoma --text=/mnt/e/tesseract-ocr/langdata/tha/tha.training_text Rendered page 0 to file /tmp/tmp.QjgaXWkS0p/tha/tha.Tahoma.exp0.tif Rtl = 0 ,vertical=0

=== Phase UP: Generating unicharset and unichar properties files === [Fri Sep 22 03:20:06 DST 2017] /usr/bin/unicharset_extractor --output_unicharset /tmp/tmp.QjgaXWkS0p/tha/tha.unicharset --norm_mode 2 /tmp/tmp.QjgaXWkS0p/tha/tha.Tahoma.exp0.box

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/1148#issuecomment-331339919, or mute the thread https://github.com/notifications/unsubscribe-auth/AE2_o9kuce_oT6yLz9RekSMTKGXelO3qks5skye7gaJpZM4PfLUZ .

Shreeshrii commented 7 years ago

Just tested on my pc (WSL run with moxaterm)

root@All-in-1-Touch:/mnt/c/Users/User/shree/tesseract-HEAD# training/tesstrain.sh \
 --fonts_dir /mnt/c/Windows/Fonts \
  --lang tha \
  --noextract_font_properties  --linedata_only \
  --exposures "0" \
  --langdata_dir ../langdata \
  --tessdata_dir ../tessdata \
  --fontlist \
   "Tahoma" \
   --output_dir ../tesstutorial/tha

=== Starting training for language 'tha'
[Fri Sep 22 09:33:08 DST 2017] /usr/local/bin/text2image --fonts_dir=/mnt/c/Windows/Fonts --font=Tahoma --outputbase=/tmp/font_tmp.Vy1LLR1cHi/sample_text.txt --text=/tmp
/font_tmp.Vy1LLR1cHi/sample_text.txt --fontconfig_tmpdir=/tmp/font_tmp.Vy1LLR1cHi
Rendered page 0 to file /tmp/font_tmp.Vy1LLR1cHi/sample_text.txt.tif

=== Phase I: Generating training images ===
Rendering using Tahoma
[Fri Sep 22 09:34:11 DST 2017] /usr/local/bin/text2image --fontconfig_tmpdir=/tmp/font_tmp.Vy1LLR1cHi --fonts_dir=/mnt/c/Windows/Fonts --strip_unrenderable_words --leadi
ng=48 --char_spacing=0.0 --exposure=0 --outputbase=/tmp/tmp.9wT9v8Aqai/tha/tha.Tahoma.exp0 --max_pages=3 --font=Tahoma --text=../langdata/tha/tha.training_text
Rendered page 0 to file /tmp/tmp.9wT9v8Aqai/tha/tha.Tahoma.exp0.tif
Rendered page 1 to file /tmp/tmp.9wT9v8Aqai/tha/tha.Tahoma.exp0.tif
Rendered page 2 to file /tmp/tmp.9wT9v8Aqai/tha/tha.Tahoma.exp0.tif
ivanzz1001 commented 7 years ago

Please execute the following command to check what fonts have you installed(Centos):

# yum install fontconfig mkfontscale
# fc-list

# text2image --fonts_dir /usr/share/fonts --list_available_fonts
xanxus10th commented 7 years ago

@Shreeshrii I just test with engand "Arial" Font by using

training/tesstrain.sh  \
  --fonts_dir /usr/share/fonts/ \
  --lang eng  \
  --linedata_only \
  --noextract_font_properties \
  --exposures "0"    \
  --langdata_dir /mnt/e/tesseract-ocr/langdata \
  --tessdata_dir /mnt/e/tesseract-ocr/tessdata   \
  --output_dir /usr/share/ \
  --fontlist "Arial"

And tha with "Arial" but it got a same error

ERROR: Non-existent flag --max_pages=3
ERROR: /tmp/tmp.A41qaylCwa/tha/tha.Arial.exp0.box does not exist or is not readable
ERROR: /tmp/tmp.A41qaylCwa/tha/tha.Arial.exp0.box does not exist or is not readable

@ivanzz1001 yum install fontconfig mkfontscale >> result is nothing, terminal get stuck must to re-open it.

fc-list

/usr/share/fonts/truetype/Fonts/constan.ttf: Constantia:style=Regular
/usr/share/fonts/truetype/Fonts/browalia.ttc: BrowalliaUPC:style=Bold Italic,Negreta cursiva,tučné kurzíva,fed kursiv,Fe
tt Kursiv,Έντονα Πλάγια,Negrita Cursiva,Lihavoitu Kursivoi,Gras Italique,Félkövér dőlt,Grassetto Corsivo,Vet Cursief,Hal
vfet Kursiv,Pogrubiona kursywa,Negrito Itálico,Полужирный Курсив,Tučná kurzíva,Fet Kursiv,Kalın İtalik,Krepko poševno,Lo
di etzana
/usr/share/fonts/truetype/Fonts/trebuc.ttf: Trebuchet MS:style=Regular,Normal,obyčejné,Standard,Κανονικά,Normaali,Normál
,Normale,Standaard,Normalny,Обычный,Normálne,Navadno,Arrunta
/usr/share/fonts/truetype/dejavu/DejaVuSerif.ttf: DejaVu Serif:style=Book
/usr/share/fonts/truetype/Fonts/calibril.ttf: Calibri,Calibri Light:style=Light,Regular
/usr/share/fonts/truetype/Fonts/phagspa.ttf: Microsoft PhagsPa:style=Regular,Normal,obyčejné,Standard,Κανονικά,Normaali,
Normál,Normale,Standaard,Normalny,Обычный,Normálne,Navadno,Arrunta
/usr/share/fonts/truetype/Fonts/segmdl2.ttf: Segoe MDL2 Assets:style=Regular,Normal,obyčejné,Standard,Κανονικά,Normaali,
Normál,Normale,Standaard,Normalny,Обычный,Normálne,Navadno,thường,Arrunta
/usr/share/fonts/truetype/Fonts/upcii.ttf: IrisUPC:style=Italic,Cursiva,kurzíva,kursiv,Πλάγια,Kursivoitu,Italique,Dőlt,C
orsivo,Cursief,Kursywa,Itálico,Курсив,İtalik,Poševno,Etzana
/usr/share/fonts/truetype/Fonts/msyhbd.ttc: Microsoft YaHei UI:style=Bold,Negreta,tučné,fed,Fett,Έντονα,Negrita,Lihavoit
u,Gras,Félkövér,Grassetto,Vet,Halvfet,Pogrubiony,Negrito,Полужирный,Fet,Kalın,Krepko,Lodia
/usr/share/fonts/truetype/Fonts/angsana.ttc: AngsanaUPC:style=Bold,Negreta,tučné,fed,Fett,Έντονα,Negrita,Lihavoitu,Gras,
Félkövér,Grassetto,Vet,Halvfet,Pogrubiony,Negrito,Полужирный,Fet,Kalın,Krepko,Lodia
/usr/share/fonts/truetype/Fonts/lucon.ttf: Lucida Console:style=Regular,Normal,obyčejné,Standard,Κανονικά,Normaali,Normá
l,Normale,Standaard,Normalny,Обычный,Navadno,Arrunta
/usr/share/fonts/truetype/Fonts/msjh.ttc: Microsoft JhengHei UI:style=Normal,Regular,obyčejné,Standard,Κανονικά,Normaali
,Normál,Normale,Standaard,Normalny,Обычный,Normálne,Navadno,Arrunta
/usr/share/fonts/truetype/Fonts/cour.ttf: Courier New:style=Regular,Normal,obyčejné,Standard,Κανονικά,Normaali,Normál,No
rmale,Standaard,Normalny,Обычный,Normálne,Navadno,thường,Arrunta
/usr/share/fonts/truetype/Fonts/SitkaZ.ttc: Sitka Display,Sitka:style=Bold Italic,Display Bold Italic
/usr/share/fonts/truetype/Fonts/calibriz.ttf: Calibri:style=Bold Italic
/usr/share/fonts/truetype/Fonts/consolaz.ttf: Consolas:style=Bold Italic
/usr/share/fonts/truetype/Fonts/seguisb.ttf: Segoe UI,Segoe UI Semibold:style=Semibold,Regular

text2image --fonts_dir /usr/share/fonts --list_available_fonts

 0: 8514fix
  1: 8514oem
  2: Angsana New
  3: Angsana New Bold
  4: Angsana New Bold Italic
  5: Angsana New Italic
  6: AngsanaUPC
  7: AngsanaUPC Bold
  8: AngsanaUPC Bold Italic
  9: AngsanaUPC Italic
 10: Arial
 11: Arial Bold
 12: Arial Bold Italic
 13: Arial Heavy
 14: Arial Italic
 15: Browallia New
 16: Browallia New Bold
 17: Browallia New Bold Italic
 18: Browallia New Italic
 19: BrowalliaUPC
 20: BrowalliaUPC Bold
 21: BrowalliaUPC Bold Italic
 22: BrowalliaUPC Italic
 23: Calibri
 24: Calibri Bold
 25: Calibri Bold Italic
 26: Calibri Italic
 27: Calibri Light
 28: Calibri Light Italic
 29: Cambria
 30: Cambria Bold
 31: Cambria Bold Italic
 32: Cambria Italic
 33: Cambria Math
 34: Candara
 35: Candara Bold
 36: Candara Bold Italic
 37: Candara Italic
 38: Comic Sans MS
 39: Comic Sans MS Bold
 40: Comic Sans MS Bold Italic
 41: Comic Sans MS Italic
 42: Consolas
 43: Consolas Bold
 44: Consolas Bold Italic
 45: Consolas Italic
 46: Constantia
 47: Constantia Bold
 48: Constantia Bold Italic
 49: Constantia Italic
 50: Corbel
 51: Corbel Bold
 52: Corbel Bold Italic
 53: Corbel Italic
 54: Cordia New
 55: Cordia New Bold
 56: Cordia New Bold Italic
 57: Cordia New Italic
 58: CordiaUPC
 59: CordiaUPC Bold
 60: CordiaUPC Bold Italic
 61: CordiaUPC Italic
 62: Courier
 63: Courier New
 64: Courier New Bold
 65: Courier New Bold Italic
 66: Courier New Italic
 67: DejaVu Sans
 68: DejaVu Sans Bold
 69: DejaVu Sans Mono
 70: DejaVu Sans Mono Bold
 71: DejaVu Serif
 72: DejaVu Serif Bold
 73: DilleniaUPC
 74: DilleniaUPC Bold
 75: DilleniaUPC Bold Italic
 76: DilleniaUPC Italic
 77: Ebrima
 78: Ebrima Bold
 79: EucrosiaUPC
 80: EucrosiaUPC Bold
 81: EucrosiaUPC Bold Italic
 82: EucrosiaUPC Italic
 83: Fixedsys
 84: Franklin Gothic Medium,
 85: Franklin Gothic Medium, Italic
 86: FreesiaUPC
 87: FreesiaUPC Bold
 88: FreesiaUPC Bold Italic
 89: FreesiaUPC Italic
 90: Gabriola
 91: Gadugi
 92: Gadugi Bold
 93: Georgia
 94: Georgia Bold
 95: Georgia Bold Italic
 96: Georgia Italic
 97: HoloLens MDL2 Assets
 98: Impact Condensed
 99: IrisUPC
100: IrisUPC Bold
101: IrisUPC Bold Italic
102: IrisUPC Italic
103: JasmineUPC
104: JasmineUPC Bold
105: JasmineUPC Bold Italic
106: JasmineUPC Italic
107: Javanese Text
108: KodchiangUPC
109: KodchiangUPC Bold
110: KodchiangUPC Bold Italic
111: KodchiangUPC Italic
112: Leelawadee
113: Leelawadee Bold
114: Leelawadee UI
115: Leelawadee UI Bold
116: Leelawadee UI Semi-Light
117: LilyUPC
118: LilyUPC Bold
119: LilyUPC Bold Italic
120: LilyUPC Italic
121: Lucida Console Semi-Condensed
122: Lucida Sans Unicode
123: MS Gothic
124: MS PGothic
125: MS Sans Serif
126: MS Serif
127: MS UI Gothic
128: MV Boli
129: Malgun Gothic
130: Malgun Gothic Bold
131: Malgun Gothic Light
132: Marlett Medium
133: Microsoft Himalaya
134: Microsoft JhengHei
135: Microsoft JhengHei Bold
136: Microsoft JhengHei Light
137: Microsoft JhengHei UI
138: Microsoft JhengHei UI Bold
139: Microsoft JhengHei UI Light
140: Microsoft New Tai Lue
141: Microsoft New Tai Lue Bold
142: Microsoft PhagsPa
143: Microsoft PhagsPa Bold
144: Microsoft Sans Serif
145: Microsoft Tai Le
146: Microsoft Tai Le Bold
147: Microsoft YaHei
148: Microsoft YaHei Bold
149: Microsoft YaHei Light
150: Microsoft YaHei UI
151: Microsoft YaHei UI Bold
152: Microsoft YaHei UI Light
153: Microsoft Yi Baiti
154: MingLiU-ExtB
155: MingLiU_HKSCS-ExtB
156: Mongolian Baiti
157: Myanmar Text
158: Myanmar Text Bold
159: NSimSun
160: Nirmala UI
161: Nirmala UI Bold
162: Nirmala UI Semi-Light
163: PMingLiU-ExtB
164: Palatino Linotype
165: Palatino Linotype Bold
166: Palatino Linotype Bold Italic
167: Palatino Linotype Italic
168: Segoe MDL2 Assets
169: Segoe Print
170: Segoe Print Bold
171: Segoe Script
172: Segoe Script Bold
173: Segoe UI
174: Segoe UI Bold
175: Segoe UI Bold Italic
176: Segoe UI Emoji
177: Segoe UI Heavy
178: Segoe UI Heavy Italic
179: Segoe UI Historic
180: Segoe UI Italic
181: Segoe UI Light
182: Segoe UI Light Italic
183: Segoe UI Semi-Bold
184: Segoe UI Semi-Bold Italic
185: Segoe UI Semi-Light
186: Segoe UI Semi-Light Italic
187: Segoe UI Symbol
188: SimSun
189: SimSun-ExtB
190: Sitka Banner
191: Sitka Banner Bold
192: Sitka Banner Bold Italic
193: Sitka Banner Italic
194: Sitka Display
195: Sitka Display Bold
196: Sitka Display Bold Italic
197: Sitka Display Italic
198: Sitka Heading
199: Sitka Heading Bold
200: Sitka Heading Bold Italic
201: Sitka Heading Italic
202: Sitka Small
203: Sitka Small Bold
204: Sitka Small Bold Italic
205: Sitka Small Italic
206: Sitka Subheading
207: Sitka Subheading Bold
208: Sitka Subheading Bold Italic
209: Sitka Subheading Italic
210: Sitka Text
211: Sitka Text Bold
212: Sitka Text Bold Italic
213: Sitka Text Italic
214: Small Fonts
215: Sylfaen
216: Symbol
217: System
218: Tahoma
219: Tahoma Bold
220: Terminal
221: Terminal Greek 737 (437G)
222: Terminal Greek 869,
223: Times New Roman,
224: Times New Roman, Bold
225: Times New Roman, Bold Italic
226: Times New Roman, Italic
227: Trebuchet MS
228: Trebuchet MS Bold
229: Trebuchet MS Bold Italic
230: Trebuchet MS Italic
231: Verdana
232: Verdana Bold
233: Verdana Bold Italic
234: Verdana Italic
235: Webdings
236: Wingdings
237: Yu Gothic
238: Yu Gothic Bold
239: Yu Gothic Light
240: Yu Gothic Medium
241: Yu Gothic UI
242: Yu Gothic UI Bold
243: Yu Gothic UI Light
244: Yu Gothic UI Semi-Bold
245: Yu Gothic UI Semi-Light
Shreeshrii commented 7 years ago

Are you sure you don't have an older version of program somewhere?

Did

make training and make training-install

complete without errors.

xanxus10th commented 7 years ago

@Shreeshrii
make training got this error

make: Warning: File 'Makefile.in' has modification time 21613 s in the future
/bin/bash ./config.status --recheck
running CONFIG_SHELL=/bin/bash /bin/bash ./configure --no-create --no-recursion
checking for g++... g++
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
Using git revision: 4.00.00dev-687-g2cc531e
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... configure: error: newly created file is older than distributed files!
Check your system clock
Makefile:430: recipe for target 'config.status' failed
make: *** [config.status] Error 1

and the make training-install I got this

make: Warning: File 'Makefile.in' has modification time 21585 s in the future
/bin/bash ./config.status --recheck
running CONFIG_SHELL=/bin/bash /bin/bash ./configure --no-create --no-recursion
checking for g++... g++
checking whether the C++ compiler works... yes
checking for C++ compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
Using git revision: 4.00.00dev-687-g2cc531e
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... configure: error: newly created file is older than distributed files!
Check your system clock
Makefile:430: recipe for target 'config.status' failed
make: *** [config.status] Error 1
xanxus10th commented 7 years ago

I try to re-install tesseract. but now I got error with the same ISSUE#1114 I try changeLINE 220in normstrngs.cpp but it doesn't fix.

Invalid Unicode codepoint: 0xffffffc2
IsValidCodepoint(ch):Error:Assert failed:in file normstrngs.cpp, line 225
ERROR: /tmp/tmp.9TCN0u9M2a/eng/eng.unicharset does not exist or is not readable
amitdo commented 7 years ago

So your original issue was solved.

Don't mix two different issues in one report. Please close this issue.

xanxus10th commented 7 years ago

Okay, Let me summary this issue for anyone that get error the same with me. If you got this error about Non-existent flag Re-install tesseract and don't forget to

sudo ldconfig
make training
sudo make training-install

then if you get error about Invalid Unicode codepoint: try to change code in normstrng.cpp See this issue. Issue1114 then run this again

make run
make training
sudo make training-install

And the last, I got error about TESSDATA_PREFIX I fix by read this Issue Issue221

Thank you everyone to help me fix this. I am very appreciate it.