Closed FernandoGOT closed 5 years ago
@khalajink Yes, see my answer in that SO thread https://stackoverflow.com/a/57968945/1021819
@jtlz2 Yes i followed your answer got the pango issue fixed but text2image issue still exists. Any idea about it?
When i try to run 'text2image --list_available_fonts --fonts_dir=/Library/Fonts'. Error is '-bash: /usr/local/bin/text2image: No such file or directory'.
@khalajink Yes, see my answer in that SO thread https://stackoverflow.com/a/57968945/1021819
Thanks for the answer. The commands you shared didn't work for me but the instruction on how to diagnose the issue helped a lot. It turns out that I do not have zlib
installed so I installed it and now I can finally build the training tools.
I have a different but slightly similar problem in 2020 still.
I've successfully installed the latest Tesseract (master branch) on the latest OSX (11.1 Big Sur).
tesseract 5.0.0-alpha-855-g6d86
leptonica-1.80.0
libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.1.0 : libopenjp2 2.3.1
Found AVX2
Found AVX
Found FMA
Found SSE
Found libarchive 3.4.3 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.6 liblz4/1.9.2 libzstd/1.4.5
Found libcurl/7.64.1 SecureTransport (LibreSSL/2.8.3) zlib/1.2.11 nghttp2/1.41.0
However, my training tools (even though they have been installed) could not find the actual files.
For example, if I call a text2image I see the following error message
This script is just a wrapper for text2image.
See the libtool documentation for more information.
ERROR: Program text2image failed. Abort.
If I enable Debug for the bash script I see the following problem
❯ text2image --list_available_fonts --fonts_dir=~/Library/Fonts
+ sed_quote_subst='s|\([`"$\\]\)|\\\1|g'
+ test -n ''
+ case `(set -o) 2>/dev/null` in
+ set -o posix
+ BIN_SH=xpg4
+ export BIN_SH
+ DUALCASE=1
+ export DUALCASE
+ unset CDPATH
+ relink_command=
+ test '' = '%%%MAGIC variable%%%'
+ test '' '!=' '%%%MAGIC variable%%%'
+ file=/usr/local/bin/text2image
+ ECHO='printf %s\n'
+ lt_option_debug=
+ func_parse_lt_options /usr/local/bin/text2image --list_available_fonts '--fonts_dir=~/Library/Fonts'
+ lt_script_arg0=/usr/local/bin/text2image
+ shift
+ for lt_opt in '"$@"'
+ case "$lt_opt" in
+ for lt_opt in '"$@"'
+ case "$lt_opt" in
+ test -n ''
++ printf '%s\n' /usr/local/bin/text2image
++ /usr/bin/sed 's%/[^/]*$%%'
+ thisdir=/usr/local/bin
+ test x/usr/local/bin = x/usr/local/bin/text2image
++ ls -ld /usr/local/bin/text2image
++ /usr/bin/sed -n 's/.*-> //p'
+ file=
+ test -n ''
+ WRAPPER_SCRIPT_BELONGS_IN_OBJDIR=no
+ test no = yes
++ cd /usr/local/bin
++ pwd
+ absdir=/usr/local/bin
+ test -n /usr/local/bin
+ thisdir=/usr/local/bin
+ program=text2image
+ progdir=/usr/local/bin/.libs
+ test -f /usr/local/bin/.libs/text2image
+ printf '%s\n' '/usr/local/bin/text2image: error: '\''/usr/local/bin/.libs/text2image'\'' does not exist'
/usr/local/bin/text2image: error: '/usr/local/bin/.libs/text2image' does not exist
+ printf '%s\n' 'This script is just a wrapper for text2image.'
This script is just a wrapper for text2image.
+ printf '%s\n' 'See the libtool documentation for more information.'
See the libtool documentation for more information.
+ exit 1
basically, all training tools can't find thier actual executable files which are located under `tesseract/.libs/
Did I miss something during the configuration?
@nnnikolay, I am sorry, that was my fault. It is now fixed with commit 421ebf0418f415c2ca270521243d4edc36dd44bf.
wow, @stweil thank you for your swift reaction. it seems that this step works now!
You can see the error detail in tesseract/build/config.log
about pango 1.22.0 or higher is required, but was not found
!!!!!!!!
This is step by step that I used to install tesseract 4.0 on my MAC OS X and the fixes/workaround I needed to do so I could make it work. I'm sharing this "guide" with the intention of helping other people who may have the same problems I had.
Special thanks for Shree that helped me at the google groups
Project and more details: https://github.com/tesseract-ocr/tesseract
where to get help?
google group: https://groups.google.com/forum/#!forum/tesseract-ocr git: https://github.com/tesseract-ocr/tesseract/issues
Platform: MAC OS X 10.13.3 Tesseract: 4.0.0-beta.1-69-g10f4 leptonica-1.75.3 libjpeg 9c : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11
Found AVX2 Found AVX Found SSE
Compiling Tesseract - tesseract 4.0
Reference: https://github.com/tesseract-ocr/tesseract/wiki/Compiling#macos
Warning: Don't install tesseract using brew, since you can't generate the
ScrollView.jar
from it! (At least I wasn't able to generate it)Steps
1 - Install these libs
2 - Run the code
Obs.:
text2image
is set to use icu4c/60.2 but the actual version is icu4c/61.13 - Clone tesseract repo
4 - Enter in the folder
5 - Run the script
6 - Run the code, and copy the
CPPFLAGS
andLDFLAGS
7 - Update the
CPPFLAGS
andLDFLAGS
and execute the code8 - Run the code
9 - Run the code
10 - Run the code
Obs.: this is the
sudo ldconfig
version for MAC OS X11 - Run the code
Creating ScrollView.jar - tesseract 4.0
Reference: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#lstmtraining-command-line https://github.com/tesseract-ocr/tesseract/wiki/ViewerDebugging
Important: Use the JDK 8 to build, or else it is going to return an error
Steps
1 - Download the files
piccolo2d-core-3.0.jar
andpiccolo2d-extras-3.0.jar
http://search.maven.org/remotecontent?filepath=org/piccolo2d/piccolo2d-core/3.0/piccolo2d-core-3.0.jar http://search.maven.org/remotecontent?filepath=org/piccolo2d/piccolo2d-extras/3.0/piccolo2d-extras-3.0.jar
2 - Move the files
piccolo2d-core-3.0.jar
andpiccolo2d-extras-3.0.jar
totesseract/java
3 - Enter the
tesseract/java
folder4 - Set the var
SCROLLVIEW_PATH
to yourtesseract/java
folder and run the codeTraining Font - tesseract 4.0
Reference: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#user-content-using-tesstrain
Steps
1 - Clone the langdata dir from git
2 - Enter the tesseract folder
3 - Execute this code and select one font from the list (I recommend "Verdana")
Font dir for MAC can be : ~/Library/Fonts /Library/Fonts/ /Network/Library/Fonts/ /System/Library/Fonts/ /System Folder/Fonts/
More details here: https://support.apple.com/en-us/HT201722
4 - replace the line 195 at file
tesseract/training/tesstrain_utils.sh
fromObs.: this is a fix for the error:
5 - Clone the tessdata repo from git (i recommend the "tessdata_best" since it is the more precise, "tessdata_fast" is just more fast)
or
6 - Copy the
tessdata_best/eng.traineddata
(for english training) from the tessdata you just cloned and past attesseract/tessdata/
7 - Create the training data
Add the prefix
PANGOCAIRO_BACKEND=fc
if using MAC OSX8 - Create other training data using other font to compare
Add the prefix
PANGOCAIRO_BACKEND=fc
if using MAC OSX9 - Create the needed folder
10 - Start the training
Case you failed to build ScrollView.jar, set debug_interval to -1
--debug_interval -1
11 - Monitor the log on another console
12 - Test Accuracy with other font
13 - Test Accuracy with best traindata
14 - Test Accuracy with actual traindata (in this case the same as step 13)
Fine tuning - tesseract 4.0
Reference: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact
Steps
1 - Create the necessary folder
2 - Start to fine tuning
3 - Validate the progress
4 - Create the necessary folder
5 - Combine the trained data
6 - Train merged data
7 - Validate the results on the main training file
8 - Validate the results on our training file
Fine tuning add ± character - tesseract 4.0
Reference: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for--a-few-characters
Steps
1 - Modify
langdata/eng/eng.training_text
and include these lines:2 - Generate the training file
3 - Generate the eval data
4 - Combine trained data files
5 - Fine tuning
6 - Test the result on other fonts
6 - Test the result test on main font