tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
599 stars 178 forks source link

Question : unicharset_extractor error. How can i solved it ? #306

Closed Iaurkano64 closed 2 years ago

Iaurkano64 commented 2 years ago

Hello everybody,

i'm trying to train tesseract for the first time, unicharset_extractor seems to not be happy with my command line !!

I'am trying to used the command on Ocrd-testset, before using my data.

i launched this command : .\unicharset_extractor.exe --output_unicharset \tesstrain-windows-gui-main\data\eng_she_ot\my.unicharset --norm_mode 2 \tesstrain-windows-gui-main\data\deu_she_ot\all-gt

and nothing happends !!!! using the tesstrain-windows-gui i received this error message : -1073741515 exit code for command .... Yyou will find enclosed the "all_gt" file used. (from ocrd_testset)

Thnak's for your help all-gt.txt

buliasz commented 2 years ago

The file you've attached has name all-gt.txt, so in the command you'd need to add .txt part at the end. I've executed it with the file you provided and it works on my side:

D:\Tesseract>unicharset_extractor.exe --output_unicharset .\my.unicharset --norm_mode 2 .\all-gt.txt
Bad box coordinates in boxfile string! ich denke. Aber was die ┼┐elige Frau Geheimr├Ąthin
Extracting unicharset from plain text file .\all-gt.txt
Other case I of i is not in unicharset
Other case ├ä of ├Ą is not in unicharset
Other case ├ľ of ├ is not in unicharset
Other case Y of y is not in unicharset
Wrote unicharset file .\my.unicharset

D:\Tesseract>echo Exit Code is %errorlevel%
Exit Code is 0
Iaurkano64 commented 2 years ago

hi,

Ok i found why i didnt see any error message.

i was using powershell to execute the command, and powershell mask errors messages.

Using dos Command, i see that two dll are needed icuin63.dll and icuuc63.dll, that's why unicharset_extractor.exe does not work, do you known that issue ?

Regards.

Iaurkano64 commented 2 years ago

ok, by re-installing a new Tesseract version 5.0.1 (before i was in 4.10) the problem disappear.