Closed FernandoGOT closed 5 years ago
Thank you for step by step info. This should probably be added to wiki.
One correction:
When doing fine-tune training, ONLY traineddata files from tessdata_best can be used as a base traineddata to continue from
Models from tessdata_fast as well as tessdata will NOT work.
On Sun 8 Apr, 2018, 3:16 PM FernandoGOT, notifications@github.com wrote:
This is step by step that I used to install tesseract 4.0 on my MAC OS X and the fixes/workaround I needed to do so I could make it work. I'm sharing this "guide" with the intention of helping other people who may have the same problems I had.
Special thanks for Shree that helped me at the google groups
Project and more details: https://github.com/tesseract-ocr/tesseract
where to get help?
google group: https://groups.google.com/forum/#!forum/tesseract-ocr git: https://github.com/tesseract-ocr/tesseract/issues
Platform: MAC OS X 10.13.3 Tesseract: 4.0.0-beta.1-69-g10f4 leptonica-1.75.3 libjpeg 9c : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11
Found AVX2 Found AVX Found SSE Compiling Tesseract - tesseract 4.0
Reference: https://github.com/tesseract-ocr/tesseract/wiki/Compiling#macos
Warning: Don't install tesseract using brew, since you can't generate the ScrollView.jar from it! (At least I wasn't able to generate it) Steps
1 - Install these libs
brew install automake autoconf autoconf-archive libtool brew install pkgconfig brew install icu4c brew install leptonica brew install gcc
2 - Run the code
ln -hfs /usr/local/Cellar/icu4c/60.2 /usr/local/opt/icu4c
Obs.: text2image is set to use icu4c/60.2 but the actual version is icu4c/61.1
3 - Clone tesseract repo
git clone https://github.com/tesseract-ocr/tesseract/
4 - Enter in the folder
cd tesseract
5 - Run the script
./autogen.sh
6 - Run the code, and copy the CPPFLAGS and LDFLAGS
brew info icu4c
7 - Update the CPPFLAGS and LDFLAGS and execute the code
./configure \ CPPFLAGS=-I/usr/local/opt/icu4c/include \ LDFLAGS=-L/usr/local/opt/icu4c/lib
8 - Run the code
make -j
9 - Run the code
sudo make install
10 - Run the code
sudo update_dyld_shared_cache
Obs.: this is the sudo ldconfig version for MAC OS X
11 - Run the code
make training
Creating ScrollView.jar - tesseract 4.0
Reference:
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#lstmtraining-command-line https://github.com/tesseract-ocr/tesseract/wiki/ViewerDebugging
Important: Use the JDK 8 to build, or else it is going to return an error Steps
1 - Download the files piccolo2d-core-3.0.jar and piccolo2d-extras-3.0.jar
2 - Move the files piccolo2d-core-3.0.jar and piccolo2d-extras-3.0.jar to tesseract/java
3 - Enter the tesseract/java folder
cd java
4 - Set the var SCROLLVIEW_PATH to your tesseract/java folder and run the code
SCROLLVIEW_PATH=~/projects/tesseract/java make ScrollView.jar
Training Font - tesseract 4.0
Reference: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#user-content-using-tesstrain Steps
1 - Clone the langdata dir from git
git clone https://github.com/tesseract-ocr/langdata
2 - Enter the tesseract folder
cd ..
3 - Execute this code and select one font from the list (I recommend "Verdana")
text2image --list_available_fonts --fonts_dir=/Library/Fonts
Font dir for MAC can be : ~/Library/Fonts /Library/Fonts/ /Network/Library/Fonts/ /System/Library/Fonts/ /System Folder/Fonts/
More details here: https://support.apple.com/en-us/HT201722
4 - replace the line 195 at file tesseract/training/tesstrain_utils.sh from
- export FONT_CONFIG_CACHE=$(mktemp -d --tmpdir font_tmp.XXXXXXXXXX)+ export FONT_CONFIG_CACHE=$(mktemp -d -t font_tmp.XXXXXXXXXX)
Obs.: this is a fix for the error:
mktemp: illegal option -- - usage: mktemp [-d] [-q] [-t prefix] [-u] template ... mktemp [-d] [-q] [-u] -t prefix /Users/username/projects/tesseract/training/tesstrain_utils.sh: line 197: /sample_text.txt: Permission denied
5 - Clone the tessdata repo from git (i recommend the "tessdata_best" since it is the more precise, "tessdata_fast" is just more fast)
git clone https://github.com/tesseract-ocr/tessdata_best
or
git clone https://github.com/tesseract-ocr/tessdata_fast
6 - Copy the tessdata_best/eng.traineddata (for english training) from the tessdata you just cloned and past at tesseract/tessdata/
7 - Create the training data
PANGOCAIRO_BACKEND=fc \ ~/projects/tesseract/training/tesstrain.sh \ --fonts_dir /Library/Fonts \ --lang eng \ --linedata_only \ --noextract_font_properties \ --exposures "0" \ --langdata_dir ~/projects/langdata \ --tessdata_dir ~/projects/tesseract/tessdata \ --fontlist "Verdana" \ --output_dir ~/tesstutorial/engtrain
Add the prefix PANGOCAIRO_BACKEND=fc if using MAC OSX
8 - Create other training data using other font to compare
PANGOCAIRO_BACKEND=fc \ ~/projects/tesseract/training/tesstrain.sh \ --fonts_dir /Library/Fonts \ --lang eng \ --linedata_only \ --noextract_font_properties \ --exposures "0" \ --langdata_dir ~/projects/langdata \ --tessdata_dir ~/projects/tesseract/tessdata \ --fontlist "Times New Roman," \ --output_dir ~/tesstutorial/engeval
Add the prefix PANGOCAIRO_BACKEND=fc if using MAC OSX
9 - Create the needed folder
mkdir -p ~/tesstutorial/engoutput
10 - Start the training
SCROLLVIEW_PATH=~/projects/tesseract/java \ ~/projects/tesseract/training/lstmtraining \ --debug_interval 100 \ --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \ --net_spec '[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256 O1c111]' \ --model_output ~/tesstutorial/engoutput/base \ --learning_rate 20e-4 \ --train_listfile ~/tesstutorial/engtrain/eng.training_files.txt \ --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt \ --max_iterations 5000 &>~/tesstutorial/engoutput/basetrain.log
Case you failed to build ScrollView.jar, set debug_interval to -1 --debug_interval -1
11 - Monitor the log on another console
tail -f ~/tesstutorial/engoutput/basetrain.log
12 - Test Accuracy with other font
~/projects/tesseract/training/lstmeval \ --model ~/tesstutorial/engoutput/base_checkpoint \ --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \ --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt
13 - Test Accuracy with best traindata
~/projects/tesseract/training/lstmeval \ --model ~/projects/tessdata_best/eng.traineddata \ --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt
14 - Test Accuracy with actual traindata (in this case the same as step 13)
~/projects/tesseract/training/lstmeval \ --model ~/projects/tesseract/tessdata/eng.traineddata \ --eval_listfile ~/tesstutorial/engtrain/eng.training_files.txt
Fine tuning - tesseract 4.0
Reference: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact Steps
1 - Create the necessary folder
mkdir -p ~/tesstutorial/verdana_from_small
2 - Start to fine tuning
~/projects/tesseract/training/lstmtraining \ --model_output ~/tesstutorial/verdana_from_small/verdana \ --continue_from ~/tesstutorial/engoutput/base_checkpoint \ --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \ --train_listfile ~/tesstutorial/engeval/eng.training_files.txt \ --max_iterations 1200
3 - Validate the progress
~/projects/tesseract/training/lstmeval \ --model ~/tesstutorial/verdana_from_small/verdana_checkpoint \ --traineddata ~/tesstutorial/engtrain/eng/eng.traineddata \ --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt
4 - Create the necessary folder
mkdir -p ~/tesstutorial/verdana_from_full
5 - Combine the trained data
~/projects/tesseract/training/combine_tessdata \ -e ~/projects/tesseract/tessdata/eng.traineddata \ ~/tesstutorial/verdana_from_full/eng.lstm
6 - Train merged data
~/projects/tesseract/training/lstmtraining \ --model_output ~/tesstutorial/verdana_from_full/verdana \ --continue_from ~/tesstutorial/verdana_from_full/eng.lstm \ --traineddata ~/projects/tesseract/tessdata/eng.traineddata \ --train_listfile ~/tesstutorial/engeval/eng.training_files.txt \ --max_iterations 400
7 - Validate the results on the main training file
~/projects/tesseract/training/lstmeval \ --model ~/tesstutorial/verdana_from_full/verdana_checkpoint \ --traineddata ~/projects/tesseract/tessdata/eng.traineddata \ --eval_listfile ~/tesstutorial/engeval/eng.training_files.txt
8 - Validate the results on our training file
~/projects/tesseract/training/lstmeval \ --model ~/tesstutorial/verdana_from_full/verdana_checkpoint \ --traineddata ~/projects/tesseract/tessdata/eng.traineddata \ --eval_listfile ~/tesstutorial/engtrain/eng.training_files.txt
Fine tuning add ± character - tesseract 4.0
Reference: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for--a-few-characters Steps
1 - Modify langdata/eng/eng.training_text and include these lines:
alkoxy of LEAVES ±1.84% by Buying curved RESISTANCE MARKED Your (Vol. SPANIEL TRAVELED ±85¢ , reliable Events THOUSANDS TRADITIONS. ANTI-US Bedroom Leadership Inc. with DESIGNS self; ball changed. MANHATTAN Harvey's ±1.31 POPSET Os—C(11) VOLVO abdomen, ±65°C, AEROMEXICO SUMMONER = (1961) About WASHING Missouri PATENTSCOPE® # © HOME SECOND HAI Business most COLETTI, ±14¢ Flujo Gilbert Dresdner Yesterday's Dilated SYSTEMS Your FOUR ±90° Gogol PARTIALLY BOARDS firm Email ACTUAL QUEENSLAND Carl's Unruly ±8.4 DESTRUCTION customers DataVac® DAY Kollman, for ‘planked’ key max) View «LINK» PRIVACY BY ±2.96% Ask! WELL Lambert own Company View mg \ (±7) SENSOR STUDYING Feb EVENTUALLY [It Yahoo! Tv United by #DEFINE Rebel PERFORMED ±500Gb Oliver Forums Many | ©2003-2008 Used OF Avoidance Moosejaw pm* ±18 note: PROBE Jailbroken RAISE Fountains Write Goods (±6) Oberflachen source.” CULTURED CUTTING Home 06-13-2008, § ±44.01189673355 € netting Bookmark of WE MORE) STRENGTH IDENTICAL ±2? activity PROPERTY MAINTAINED
2 - Generate the training file
PANGOCAIRO_BACKEND=fc \ ~/projects/tesseract/training/tesstrain.sh \ --fonts_dir /Library/Fonts \ --lang eng \ --linedata_only \ --noextract_font_properties \ --langdata_dir ~/projects/langdata \ --tessdata_dir ~/projects/tesseract/tessdata \ --fontlist "Times New Roman," \ "Times New Roman, Bold" \ "Times New Roman, Bold Italic" \ "Times New Roman, Italic" \ "Courier New" \ "Courier New Bold" \ "Courier New Bold Italic" \ "Courier New Italic" \ --output_dir ~/tesstutorial/trainplusminus
3 - Generate the eval data
PANGOCAIRO_BACKEND=fc \ ~/projects/tesseract/training/tesstrain.sh \ --fonts_dir /Library/Fonts \ --lang eng \ --linedata_only \ --noextract_font_properties \ --langdata_dir ~/projects/langdata \ --tessdata_dir ~/projects/tesseract/tessdata \ --fontlist "Verdana" \ --output_dir ~/tesstutorial/evalplusminus
4 - Combine trained data files
~/projects/tesseract/training/combine_tessdata \ -e ~/projects/tesseract/tessdata/eng.traineddata \ ~/tesstutorial/trainplusminus/eng.lstm
5 - Fine tuning
~/projects/tesseract/training/lstmtraining \ --model_output ~/tesstutorial/trainplusminus/plusminus \ --continue_from ~/tesstutorial/trainplusminus/eng.lstm \ --traineddata ~/tesstutorial/trainplusminus/eng/eng.traineddata \ --old_traineddata ~/projects/tesseract/tessdata/eng.traineddata \ --train_listfile ~/tesstutorial/trainplusminus/eng.training_files.txt \ --max_iterations 3600
6 - Test the result on other fonts
~/projects/tesseract/training/lstmeval \ --model ~/tesstutorial/trainplusminus/plusminus_checkpoint \ --traineddata ~/tesstutorial/trainplusminus/eng/eng.traineddata \ --eval_listfile ~/tesstutorial/trainplusminus/eng.training_files.txt
6 - Test the result test on main font
~/projects/tesseract/training/lstmeval \ --model ~/tesstutorial/trainplusminus/plusminus_checkpoint \ --traineddata ~/tesstutorial/trainplusminus/eng/eng.traineddata \ --eval_listfile ~/tesstutorial/evalplusminus/eng.training_files.txt
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tesseract-ocr/tesseract/issues/1453, or mute the thread https://github.com/notifications/unsubscribe-auth/AE2_oy-BFI7DnIs0HYfIUQvk9uZT7aU3ks5tmdxdgaJpZM4TLeJ9 .
@FernandoGOT Thank you. /// As you know, @Shreeshrii he mentioned about problem - Fine tune -training. So I hope so. This page will be reflected soon . Thank you
This is a great resource! It would be even more amazing if it were in the form of a pull request of changes to the existing documentation so that it could be improved to avoid these problems for other OS X users.
I followed @FernandoGOT steps but I am getting: read_params_file: parameter not found: enable_new_segsearch
when running tesseract --list-langs
. It's the first time I try to build tesseract so I have no idea what it's going on. Any ideas on where to look?
@kas84 please post results of
tesseract -v
Version info.
Are you using latest source from Github ?
@Shreeshrii I cloned the repo like so git clone https://github.com/tesseract-ocr/tesseract/
, so if latest version is in master, yes I am.
tesseract -v
Yeah, I forgot, sorry!
leptonica-1.76.0
libjpeg 9c : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11
Found AVX2
Found AVX
Found SSE
Usually tesseract -v should also show the tesseract version.
Is the error only with --list-langs
Are you able to recognize any test images?
My bad:
tesseract 4.0.0-beta.1-232-g45a6
leptonica-1.76.0
libjpeg 9c : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11
Found AVX2
Found AVX
Found SSE
It also happens when trying to recognize an image, yes.
What commands are you using?
What tessdata-dir are you using? Eg. Where is eng.traineddata installed?
What output do you get with the following? Use ./tessdata if you have copied eng.traineddata there.
cd tesseract
tesseract ./testing/phototest.tif - --tessdata-dir ../tessdata -c page_separator=''
Page 1 This is a lot of 12 point text to test the ocr code and see if it works on all types of file format.
The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox. The quick brown dog jumped over the lazy fox.
page _seperator
The space here confuses the command line options parser.
Has any one built a dockerfile out of this ?
It works now! I am guessing it had something to do with my TESSDATA env
I am guessing it had something to do with my TESSDATA env
No.
It was due to wrong command line usage.
I am a newbie with tesseract and this has nothing to do with my bug, but... is it supposed to recognize images like this? Or do I need to treat the image first to remove everything but white so that tesseract can handle it?
Please use the forum for asking questions.
Okay, sorry!
@FernandoGOT Thank you very much for such a detailed explanation but I can't make it work. When I say "make training" it gives me "Need to reconfigure project, so there are no errors" error. Also, I couldn't create ScrollView.jar. Is it possible to update this post? Thank you.
@ysnnzlcn I'm out of times these days (working too much), but when I get some free time I'm going to make a better step-by-step of how to use tesseract and send a merge to the docs
@FernandoGOT That would be great, looking forward to it. Thanks
Under Training Font -- Tesseract 4.0, Step 7, I get a failure:
=== Starting training for language 'eng'
[Sat Sep 22 16:56:06 MST 2018] /usr/local/bin/text2image --fonts_dir=/Library/Fonts --font=Verdana --outputbase=/var/folders/8x/69qlvhl16n56q28vy__yp10r0000gn/T/font_tmp.XXXXXXXXXX.I4GMoIqG/sample_text.txt --text=/var/folders/8x/69qlvhl16n56q28vy__yp10r0000gn/T/font_tmp.XXXXXXXXXX.I4GMoIqG/sample_text.txt --fontconfig_tmpdir=/var/folders/8x/69qlvhl16n56q28vy__yp10r0000gn/T/font_tmp.XXXXXXXXXX.I4GMoIqG
=== Phase I: Generating training images ===
Rendering using Verdana
[Sat Sep 22 16:56:09 MST 2018] /usr/local/bin/text2image --fontconfig_tmpdir=/var/folders/8x/69qlvhl16n56q28vy__yp10r0000gn/T/font_tmp.XXXXXXXXXX.I4GMoIqG --fonts_dir=/Library/Fonts --strip_unrenderable_words --leading=32 --char_spacing=0.0 --exposure=0 --outputbase=/var/folders/8x/69qlvhl16n56q28vy__yp10r0000gn/T/eng-2018-09-22.XXX.rxeEXrp0/eng.Verdana.exp0 --max_pages=0 --font=Verdana --text=/Users/hadilsabbagh/tesseract/java/langdata/eng/eng.training_text
ERROR: /var/folders/8x/69qlvhl16n56q28vy__yp10r0000gn/T/eng-2018-09-22.XXX.rxeEXrp0/eng.Verdana.exp0.box does not exist or is not readable
ERROR: /var/folders/8x/69qlvhl16n56q28vy__yp10r0000gn/T/eng-2018-09-22.XXX.rxeEXrp0/eng.Verdana.exp0.box does not exist or is not readable
I have:
Hadil-Sabbaghs-MacBook-Pro:tesseract hadilsabbagh$ tesseract -v
tesseract 4.0.0-beta.4-158-g02f9d
leptonica-1.76.0
libjpeg 9c : libpng 1.6.35 : libtiff 4.0.9 : zlib 1.2.11
Found AVX2
Found AVX
Found SSE
My user is allowed to create files in that directory, and the directory itself is present.
Please advise. Hadil G. Sabbagh, Ph. D.
Hi, when I try installing this it breaks here:
[Wed Sep 26-19:00:26][MEPMBP2017][(👨💻)markphillips](~/Documents/Development/Tesseract/tesseract) =>>sudo update_dyld_shared_cache Password: update_dyld_shared_cache: warning: x86_64h skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-1.dat update_dyld_shared_cache: warning: x86_64h skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-2.dat update_dyld_shared_cache: warning: x86_64h skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-3.dat update_dyld_shared_cache: warning: i386 skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-1.dat update_dyld_shared_cache: warning: i386 skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-2.dat update_dyld_shared_cache: warning: i386 skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-3.dat update_dyld_shared_cache: warning: x86_64h rejected from cached dylibs: /System/Library/PrivateFrameworks/CreateML.framework/Versions/A/CreateML (("Could not find dependency '/System/Library/PrivateFrameworks/TuriCore.framework/Versions/A/TuriCore'")) [Wed Sep 26-19:00:48][MEPMBP2017][(👨💻)markphillips](~/Documents/Development/Tesseract/tesseract) =>>
I really would like to get this working - I've spent a lot of time getting something running...any help or pointers to instructions would be greatly appreciated..
@FernandoGOT @Shreeshrii : can you put the instruction to wiki? I would like to close this issue (related to build process). it is to long and other people mixed other topics (training) here. @FernandoGOT: can you test the recent code?
I do not have a Mac. Would prefer if someone can test with current code and then post required instructions to wiki.
'make training' returns the following error:
combine_tessdata.cpp: 100:9: error: use of undeclared identifier 'errno' errno = 0; ^ combine_tessdata.cpp:103:20: error: use of undeclared identifier 'errno' } else if (errno == 0) { ^ combine_tessdata.cpp:109:36: error: use of undeclared identifier 'errno' argv[i], strerror(errno)); ^ combine_tessdata.cpp:120:9: error: use of undeclared identifier 'errno' errno = 0; ^ combine_tessdata.cpp:123:20: error: use of undeclared identifier 'errno' } else if (errno != 0) { ^ combine_tessdata.cpp:125:46: error: use of undeclared identifier 'errno' filename.string(), strerror(errno)); ^ 6 errors generated. make[1]: [combine_tessdata.o] Error 1 make: [training] Error 2
Any fix to this issue??
Thanks
@FernandoGOT Thank you very much for such a detailed explanation but I can't make it work. When I say "make training" it gives me "Need to reconfigure project, so there are no errors" error. Also, I couldn't create ScrollView.jar. Is it possible to update this post? Thank you.
Please check your output after running this code:
./configure \
CPPFLAGS=-I/usr/local/opt/icu4c/include \
LDFLAGS=-L/usr/local/opt/icu4c/lib
I came across the same error and the log showed me an issue with icu4c and also asked to install pango.
Once done, run the above code again and hopefully your error will be solved.
@escapist21 : is your compile problem with combine_tessdata still valid?
@zdenop The errno problem exists in the current version. I'll have a look at it.
I created a bug report (#1986) and patch (#1987) for the problem reported by @escapist21.
With that bug fix and following the instructions on the wiki for MacPorts (https://github.com/tesseract-ocr/tesseract/wiki/Compiling#macos-with-macports), I was able to build both Tess and the training tools. This was not a clean install from scratch, so it's possible that I had a necessary dependency already installed, but I think this issue can be closed and folks can open new issues if they find additional problems.
One thing I noticed is that there's a small issue with linking the OpenMP version that I haven't looked into, but the standard non-OpenMP build works fine.
@tfmorris : Can you please check clean install from scratch, so we can be sure 4.0.0 is ready for Mac?
I don't usually have completely unused machines with none of the dependencies installed, but I've got a new work computer that I was able to use.
I made a minor edit to the homebrew instructions on the wiki page, but with that I was able to successfully build both the main program and the training tools using both MacPorts and Homebrew using current head of master.
@tfmorris,
Please share your minor edits.
With OpenMP you can get a major speedup, so I suggest to investigate how to make it work on macOS with Clang + LLVM's OpenMP runtime.
Hi, when I try installing this it breaks here:
[Wed Sep 26-19:00:26][MEPMBP2017][(👨💻)markphillips](~/Documents/Development/Tesseract/tesseract) =>>sudo update_dyld_shared_cache Password: update_dyld_shared_cache: warning: x86_64h skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-1.dat update_dyld_shared_cache: warning: x86_64h skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-2.dat update_dyld_shared_cache: warning: x86_64h skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-3.dat update_dyld_shared_cache: warning: i386 skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-1.dat update_dyld_shared_cache: warning: i386 skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-2.dat update_dyld_shared_cache: warning: i386 skipping because of bad install name /System/Library/PrivateFrameworks/FaceCore.framework/Versions/A/Resources/fcl-fc-3.dat update_dyld_shared_cache: warning: x86_64h rejected from cached dylibs: /System/Library/PrivateFrameworks/CreateML.framework/Versions/A/CreateML (("Could not find dependency '/System/Library/PrivateFrameworks/TuriCore.framework/Versions/A/TuriCore'")) [Wed Sep 26-19:00:48][MEPMBP2017][(👨💻)markphillips](~/Documents/Development/Tesseract/tesseract) =>>
I really would like to get this working - I've spent a lot of time getting something running...any help or pointers to instructions would be greatly appreciated..
I am having this issue too, has this been resolved here or somewhere else??
@FernandoGOT Thank you very much for such a detailed explanation but I can't make it work. When I say "make training" it gives me "Need to reconfigure project, so there are no errors" error. Also, I couldn't create ScrollView.jar. Is it possible to update this post? Thank you.
Please check your output after running this code:
./configure \
CPPFLAGS=-I/usr/local/opt/icu4c/include \
LDFLAGS=-L/usr/local/opt/icu4c/lib
I came across the same error and the log showed me an issue with icu4c and also asked to install pango.
Once done, run the above code again and hopefully your error will be solved.
@jamesoneill54 https://stackoverflow.com/questions/33259191/installing-libicu-dev-on-mac/33352241 this is work for me
I suggest to close this issue. Part of the information given here is no longer up to date.
I made a minor edit to the homebrew instructions on the wiki page,
Please share your minor edits.
@amitdo You can find my edits in the history for the wiki page.
With OpenMP you can get a major speedup, so I suggest to investigate how to make it work on macOS with Clang + LLVM's OpenMP runtime.
That's not something I have time to tackle.
I suggest to close this issue. Part of the information given here is no longer up to date.
@stweil I suggested exactly that back in Oct 2018, so obviously agree. :) If people run into new problems, they can open new issues (or just update the wiki with the necessary corrections).
Did anyone manage to overcome the following error:
make training
Need to reconfigure project, so there are no errors
And if so how?
make training
is disabled because some requirements are missing.
@stweil How do I diagnose which requirements are missing and why make training
is disabled?
nvm,
configure: WARNING: pango 1.22.0 or higher is required, but was not found.
configure: WARNING: Training tools WILL NOT be built.
configure: WARNING: Try to install libpango1.0-dev package.
checking for cairo... no
configure: WARNING: Training tools WILL NOT be built because of missing cairo library.
configure: WARNING: Try to install libcairo-dev?? package.
checking that generated files are newer than configure... done
@stweil How do I diagnose which requirements are missing and why make training is disabled?
Obviously you found the answer yourself: configure
says that pango 1.22.0 or higher is required, but was not found.
I am getting an error when 'text2image --list_available_fonts --fonts_dir=/Library/Fonts'.
Error : 'text2image: not found'.
Can you please suggest me a direction on how i can tackle this issue?
MacOS : 10.14.6
@khalajink, I suggest to ask for help at the user forum.
@khalajink Did you install the training tools (including text2image)?
If so, where are they? Make sure you've included them on your $PATH.
@jtlz2 I have followed the @FernandoGOT's comment, i do not see installation for text2image there, i suppose it comes along with icu4c. How do i include it in $PATH?
When i try to run 'text2image --list_available_fonts --fonts_dir=/Library/Fonts'. Error is '-bash: /usr/local/bin/text2image: No such file or directory'.
Also I see that you had and issue related to pango version 3 days ago, even i am facing this although i have pango 1.44.6 already installed. How did you happen to solve it?
Solved the the pango issue by following https://stackoverflow.com/questions/55361379/osx-compiling-training-tools-for-tesseract-4-0-pango-libraries-not-found
Also I see that you had and issue related to pango version 3 days ago, even i am facing this although i have pango 1.44.6 already installed. How did you happen to solve it?
This is step by step that I used to install tesseract 4.0 on my MAC OS X and the fixes/workaround I needed to do so I could make it work. I'm sharing this "guide" with the intention of helping other people who may have the same problems I had.
Special thanks for Shree that helped me at the google groups
Project and more details: https://github.com/tesseract-ocr/tesseract
where to get help?
google group: https://groups.google.com/forum/#!forum/tesseract-ocr git: https://github.com/tesseract-ocr/tesseract/issues
Platform: MAC OS X 10.13.3 Tesseract: 4.0.0-beta.1-69-g10f4 leptonica-1.75.3 libjpeg 9c : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11
Found AVX2 Found AVX Found SSE
Compiling Tesseract - tesseract 4.0
Reference: https://github.com/tesseract-ocr/tesseract/wiki/Compiling#macos
Warning: Don't install tesseract using brew, since you can't generate the
ScrollView.jar
from it! (At least I wasn't able to generate it)Steps
1 - Install these libs
2 - Run the code
Obs.:
text2image
is set to use icu4c/60.2 but the actual version is icu4c/61.13 - Clone tesseract repo
4 - Enter in the folder
5 - Run the script
6 - Run the code, and copy the
CPPFLAGS
andLDFLAGS
7 - Update the
CPPFLAGS
andLDFLAGS
and execute the code8 - Run the code
9 - Run the code
10 - Run the code
Obs.: this is the
sudo ldconfig
version for MAC OS X11 - Run the code
Creating ScrollView.jar - tesseract 4.0
Reference: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#lstmtraining-command-line https://github.com/tesseract-ocr/tesseract/wiki/ViewerDebugging
Important: Use the JDK 8 to build, or else it is going to return an error
Steps
1 - Download the files
piccolo2d-core-3.0.jar
andpiccolo2d-extras-3.0.jar
http://search.maven.org/remotecontent?filepath=org/piccolo2d/piccolo2d-core/3.0/piccolo2d-core-3.0.jar http://search.maven.org/remotecontent?filepath=org/piccolo2d/piccolo2d-extras/3.0/piccolo2d-extras-3.0.jar
2 - Move the files
piccolo2d-core-3.0.jar
andpiccolo2d-extras-3.0.jar
totesseract/java
3 - Enter the
tesseract/java
folder4 - Set the var
SCROLLVIEW_PATH
to yourtesseract/java
folder and run the codeTraining Font - tesseract 4.0
Reference: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#user-content-using-tesstrain
Steps
1 - Clone the langdata dir from git
2 - Enter the tesseract folder
3 - Execute this code and select one font from the list (I recommend "Verdana")
Font dir for MAC can be : ~/Library/Fonts /Library/Fonts/ /Network/Library/Fonts/ /System/Library/Fonts/ /System Folder/Fonts/
More details here: https://support.apple.com/en-us/HT201722
4 - replace the line 195 at file
tesseract/training/tesstrain_utils.sh
fromObs.: this is a fix for the error:
5 - Clone the tessdata repo from git (i recommend the "tessdata_best" since it is the more precise, "tessdata_fast" is just more fast)
or
6 - Copy the
tessdata_best/eng.traineddata
(for english training) from the tessdata you just cloned and past attesseract/tessdata/
7 - Create the training data
Add the prefix
PANGOCAIRO_BACKEND=fc
if using MAC OSX8 - Create other training data using other font to compare
Add the prefix
PANGOCAIRO_BACKEND=fc
if using MAC OSX9 - Create the needed folder
10 - Start the training
Case you failed to build ScrollView.jar, set debug_interval to -1
--debug_interval -1
11 - Monitor the log on another console
12 - Test Accuracy with other font
13 - Test Accuracy with best traindata
14 - Test Accuracy with actual traindata (in this case the same as step 13)
Fine tuning - tesseract 4.0
Reference: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for-impact
Steps
1 - Create the necessary folder
2 - Start to fine tuning
3 - Validate the progress
4 - Create the necessary folder
5 - Combine the trained data
6 - Train merged data
7 - Validate the results on the main training file
8 - Validate the results on our training file
Fine tuning add ± character - tesseract 4.0
Reference: https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#fine-tuning-for--a-few-characters
Steps
1 - Modify
langdata/eng/eng.training_text
and include these lines:2 - Generate the training file
3 - Generate the eval data
4 - Combine trained data files
5 - Fine tuning
6 - Test the result on other fonts
6 - Test the result test on main font