tesseract-ocr / tessdata

Trained models with fast variant of the "best" LSTM models + legacy models
Apache License 2.0
6.46k stars 2.2k forks source link

Cannot build Tesseract's training tool from source code on M1 Macbook #182

Closed yaofuzhou closed 6 months ago

yaofuzhou commented 6 months ago

I am running MacOS 14.5 on Apple M1 Max.

I wish to build the training tool of Tesseract with gcc/g++ or the system X-code c/c++. Each attempt resulted in errors during the compiling process. Here I will share my issue with gcc/g++ from Homebrew.

I have made the following edits to the beginning of Makefile.am:

## run autogen.sh to create Makefile.in from this file
ACLOCAL_AMFLAGS = -I m4

.PHONY: doc html install-langs ScrollView.jar install-jars pdf training

CLEANFILES =

SUBDIRS = . tessdata

# Set the compilers
CC = /opt/homebrew/bin/gcc-14
CXX = /opt/homebrew/bin/g++-14
AR = /opt/homebrew/bin/gcc-ar-14
RANLIB = /opt/homebrew/bin/gcc-ranlib-14

# Set the environment variables for the build process
AM_LDFLAGS = -L/opt/homebrew/opt/icu4c/lib -L/opt/homebrew/opt/libarchive/lib -L/opt/homebrew/opt/libffi/lib -L/opt/homebrew/Cellar/leptonica/1.84.1/lib
AM_CPPFLAGS = -I/opt/homebrew/opt/icu4c/include -I/opt/homebrew/opt/libarchive/include -I/opt/homebrew/opt/libffi/include -I/opt/homebrew/Cellar/leptonica/1.84.1/include/leptonica
# Set PKG_CONFIG_PATH for the required packages
PKG_CONFIG_PATH = /opt/homebrew/opt/icu4c/lib/pkgconfig:/opt/homebrew/opt/libarchive/lib/pkgconfig:/opt/homebrew/opt/libffi/lib/pkgconfig:/opt/homebrew/Cellar/leptonica/1.84.1/lib/pkgconfig

# Export the environment variables
export LDFLAGS = $(AM_LDFLAGS)
export CPPFLAGS = $(AM_CPPFLAGS)
# Export the PKG_CONFIG_PATH so it's available in the environment
export PKG_CONFIG_PATH

Then I ran ./autogen.sh ./configure PKG_CONFIG_PATH=/opt/homebrew/opt/icu4c/lib/pkgconfig:/opt/homebrew/opt/libarchive/lib/pkgconfig:/opt/homebrew/opt/libffi/lib/pkgconfig make -j sudo make install All of the above commands ran successfully. However, when I ran make training, it ended with the following error:

... ... ... CXXLD libtesseract_neon.la CXXLD libtesseract.la warning: no debug symbols in executable (-arch arm64) CXXLD combine_lang_model ld: warning: -bind_at_load is deprecated on macOS ld: archive member '/' not a mach-o file in '/Users/yaofuzhou/Documents/MORE_Health/OCR/tesseract/.libs/libtesseract_training.a' collect2: error: ld returned 1 exit status make: *** [combine_lang_model] Error 1

I did some googling and this issue may not be specific to Tesseract. Still, I would appreciate any successful experience in building Tesseract's training tool on an M1 Macbook.

stweil commented 6 months ago

It works out of the box, so don't set special environment variable or change the code. This sequence works for me:

./autogen.sh
mkdir build
cd build
../configure
make -j10 training

It takes about two minutes.

stweil commented 6 months ago

Typically Apple's clang compiler gives the best (= fastest) result. If you want to try the latest gcc-14, that works, too. I used this configure options in my test:

../configure --disable-shared CXX=g++-14 'CXXFLAGS=-g -O2 -Wall -flax-vector-conversions' --prefix=$HOME
yaofuzhou commented 6 months ago

Thanks @stweil. You were helpful as you have confirmed that it can be done. The particular solution to my situation was found at https://github.com/pyenv/pyenv/issues/2862. In short, I had to remove /opt/homebrew/opt/binutils/bin from my PATH so that the system version of the utility could take over and avoid a GNU-specific issue.