tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
599 stars 178 forks source link

lstmtraining: command not found #295

Closed TheFattestTony closed 2 years ago

TheFattestTony commented 2 years ago

Hi, can someonde please explain why after running $make training MODEL_NAME=eng_hor on WSL it returns listmtraining: command not found?

lstmtraining \ --debug_interval 0 \ --traineddata data/eng_hor/eng_hor.traineddata \ --learning_rate 0.002 \ --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx192 O1chead -n1 data/eng_hor/unicharset]" \ --model_output data/eng_hor/checkpoints/eng_hor \ --train_listfile data/eng_hor/list.train \ --eval_listfile data/eng_hor/list.eval \ --max_iterations 10000 \ --target_error_rate 0.01 /bin/bash: line 1: lstmtraining: command not found make: *** [Makefile:299: data/eng_hor/checkpoints/eng_hor_checkpoint] Error 127

TheFattestTony commented 2 years ago

I've tried to run the test script. After some minutes it generates the box files and other files but ir raises an error: /usr/bin/sh: line 1: bc: command not found /usr/bin/sh: line 4: bc: command not found

TheFattestTony commented 2 years ago

are these errors correlated?

Shreeshrii commented 2 years ago

bc: command not found

Please install the package. It is used for math calculations.

Shreeshrii commented 2 years ago

/bin/bash: line 1: lstmtraining: command not found

Have you build/installed training tools along with tesseract?

What's the output of tesseract -v and lstmtraining -v?

TheFattestTony commented 2 years ago

bc: command not found

Please install the package. It is used for math calculations.

Done.

TheFattestTony commented 2 years ago

tesseract -v tesseract v5.0.0.20211201 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 Found AVX2 Found AVX Found FMA Found SSE4.1 Found libarchive 3.5.0 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5 libzstd/1.4.5 Found libcurl/7.77.0-DEV Schannel zlib/1.2.11 zstd/1.4.5 libidn2/2.0.4 nghttp2/1.31.0

TheFattestTony commented 2 years ago

lstmtraining -v v5.0.0.20211201

TheFattestTony commented 2 years ago

make leptonica tesseract cd leptonica-1.80.0 ; \ ./configure --prefix=C:/Teste/tesstrain-main/usr && \ make -j4 install SUBDIRS=src && \ date > "leptonica.built" checking build system type... x86_64-pc-mingw64 checking host system type... x86_64-pc-mingw64 checking how to print strings... printf checking for gcc... gcc checking whether the C compiler works... yes checking for C compiler default output file name... a.exe checking for suffix of executables... .exe checking whether we are cross compiling... no checking for suffix of object files... o checking whether we are using the GNU C compiler... yes checking whether gcc accepts -g... yes checking for gcc option to accept ISO C89... none needed checking whether gcc understands -c and -o together... yes checking for a sed that does not truncate output... /usr/bin/sed checking for grep that handles long lines and -e... /usr/bin/grep checking for egrep... /usr/bin/grep -E checking for fgrep... /usr/bin/grep -F checking for ld used by gcc... c:/mingw/mingw32/bin/ld.exe checking if the linker (c:/mingw/mingw32/bin/ld.exe) is GNU ld... yes checking for BSD- or MS-compatible name lister (nm)... /c/MinGW/bin/nm -B checking the name lister (/c/MinGW/bin/nm -B) interface... BSD nm checking whether ln -s works... no, using cp -pR checking the maximum length of command line arguments... 8192 checking how to convert x86_64-pc-mingw64 file names to x86_64-pc-mingw64 format... func_convert_file_msys_to_w32 checking how to convert x86_64-pc-mingw64 file names to toolchain format... func_convert_file_msys_to_w32 checking for c:/mingw/mingw32/bin/ld.exe option to reload object files... -r checking for objdump... objdump checking how to recognize dependent libraries... file_magic ^x86 archive import|^x86 DLL checking for dlltool... dlltool checking how to associate runtime and link libraries... func_cygming_dll_for_implib checking for ar... ar checking for archiver @FILE support... @ checking for strip... strip checking for ranlib... ranlib checking for gawk... gawk checking command to parse /c/MinGW/bin/nm -B output from gcc object... ok checking for sysroot... no checking for a working dd... /usr/bin/dd checking how to truncate binary pipes... /usr/bin/dd bs=4096 count=1 checking for mt... no checking if : is a manifest tool... no checking how to run the C preprocessor... gcc -E checking for ANSI C header files... yes checking for sys/types.h... yes checking for sys/stat.h... yes checking for stdlib.h... yes checking for string.h... yes checking for memory.h... yes checking for strings.h... yes checking for inttypes.h... yes checking for stdint.h... yes checking for unistd.h... yes checking for dlfcn.h... yes checking for objdir... .libs checking if gcc supports -fno-rtti -fno-exceptions... no checking for gcc option to produce PIC... -DDLL_EXPORT -DPIC checking if gcc PIC flag -DDLL_EXPORT -DPIC works... yes checking if gcc static flag -static works... yes checking if gcc supports -c -o file.o... yes checking if gcc supports -c -o file.o... (cached) yes checking whether the gcc linker (c:/mingw/mingw32/bin/ld.exe) supports shared libraries... yes checking whether -lc should be explicitly linked in... yes checking dynamic linker characteristics... Win32 ld.exe checking how to hardcode library paths into programs... immediate checking whether stripping libraries is possible... yes checking if libtool supports shared libraries... yes checking whether to build shared libraries... yes checking whether to build static libraries... yes checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /usr/bin/mkdir -p checking whether make sets $(MAKE)... yes checking whether make supports the include directive... yes (GNU style) checking whether make supports nested variables... yes checking dependency style of gcc... gcc3 checking for gawk... (cached) gawk checking for gcc... (cached) gcc checking whether we are using the GNU C compiler... (cached) yes checking whether gcc accepts -g... (cached) yes checking for gcc option to accept ISO C89... (cached) none needed checking whether gcc understands -c and -o together... (cached) yes checking how to run the C preprocessor... gcc -E checking whether ln -s works... no, using cp -pR checking whether make sets $(MAKE)... (cached) yes checking for cos in -lm... yes checking for pkg-config... no checking for ZLIB... no checking for deflate in -lz... no checking zlib.h usability... no checking zlib.h presence... no checking for zlib.h... no checking for LIBPNG... no checking for png_read_png in -lpng... no checking png.h usability... no checking png.h presence... no checking for png.h... no checking for JPEG... no checking for jpeg_read_scanlines in -ljpeg... no checking jpeglib.h usability... no checking jpeglib.h presence... no checking for jpeglib.h... no checking for DGifOpenFileHandle in -lgif... no checking gif_lib.h usability... no checking gif_lib.h presence... no checking for gif_lib.h... no checking for LIBTIFF... no checking for TIFFOpen in -ltiff... no checking tiff.h usability... no checking tiff.h presence... no checking for tiff.h... no checking for LIBWEBP... no checking for WebPGetInfo in -lwebp... no checking webp/encode.h usability... no checking webp/encode.h presence... no checking for webp/encode.h... no checking for LIBWEBPMUX... no checking for WebPAnimEncoderOptionsInit in -lwebpmux... no checking for LIBJP2K... no checking for opj_create_decompress in -lopenjp2... no checking openjpeg-2.3/openjpeg.h usability... no checking openjpeg-2.3/openjpeg.h presence... no checking for openjpeg-2.3/openjpeg.h... no checking openjpeg-2.2/openjpeg.h usability... no checking openjpeg-2.2/openjpeg.h presence... no checking for openjpeg-2.2/openjpeg.h... no checking openjpeg-2.1/openjpeg.h usability... no checking openjpeg-2.1/openjpeg.h presence... no checking for openjpeg-2.1/openjpeg.h... no checking openjpeg-2.0/openjpeg.h usability... no checking openjpeg-2.0/openjpeg.h presence... no checking for openjpeg-2.0/openjpeg.h... no checking whether to enable debugging... checking whether make supports nested variables... (cached) yes checking for size_t... yes checking whether byte ordering is bigendian... no checking whether compiler supports -Wl,--as-needed... yes checking for fmemopen... no checking for fstatat... no checking Major version... 1 checking Minor version... 80 checking Point version... 0 checking whether ln -s works... no, using cp -pR checking that generated files are newer than configure... done configure: creating ./config.status config.status: creating Makefile config.status: creating src/endianness.h config.status: creating src/Makefile config.status: creating prog/Makefile config.status: creating lept.pc config.status: creating cmake/templates/LeptonicaConfig.cmake config.status: creating cmake/templates/LeptonicaConfig-version.cmake config.status: creating config_auto.h config.status: config_auto.h is unchanged config.status: executing libtool commands config.status: executing depfiles commands make[1]: Entering directory 'C:/Teste/tesstrain-main/leptonica-1.80.0' CDPATH="${ZSH_VERSION+.}:" && cd . && C:/Program Files/Git/usr/bin/sh.exe /c/Teste/tesstrain-main/leptonica-1.80.0/config/missing aclocal-1.16 -I m4 /usr/bin/sh: C:/Program: Is a directory make[1]: [Makefile:441: aclocal.m4] Error 126 make[1]: Leaving directory 'C:/Teste/tesstrain-main/leptonica-1.80.0' make: [Makefile:325: leptonica.built] Error 2

It raises this error after running make leptonica tesseract

Shreeshrii commented 2 years ago

lstmtraining -v v5.0.0.20211201

So lstmtraining is installed.

$make training MODEL_NAME=eng_hor on WSL it returns listmtraining: command not found?

Seems to be related to WSL.

TheFattestTony commented 2 years ago

I´ve tried to run again make training and it raises an error: Can't encode transcription: '1 2 3 4 5 6 7 8 9 $' in language ''. I´ve researched here at issues looking for similar problems, and there´s some possibilities. I believe that my unicharset file was not generated properly.

I´m using 4 simple images, black in white. Each image is a print of a numerical sequence: '1 2 3 4 5 6 7 8 9 $'. (my objetive is achieve better precision because all the other alternatives failed to differ numbers like 3 and 8, 5 and $ with a good performance when tested with the font that is causing me trouble)

Each image is properly named and has it own text file with .gt.txt extension. All the box files were generated proporly and the boxes were verified and it´s values corrected when necessary. All this data is at a folder named with model name and ground truth. At the same folder, for each image one file with .lstmf extension were created at the make training process.

But the unicharset file diverges from the unicharset of the available traineddata. The sequence is: '1 2 3 4 5 6 7 8 9 $' and the unicharset file is:

4 NULL 0 Common 0 Joined 7 0,255,0,255,0,0,0,0,0,0 Latin 1 0 1 Joined # Joined [4a 6f 69 6e 65 64 ]a |Broken|0|1 15 0,255,0,255,0,0,0,0,0,0 Common 2 10 2 |Broken|0|1 # Broken 1 8 0,255,0,255,0,0,0,0,0,0 Common 3 2 3 1 # 1 [31 ]0

It seems that some characters are missing (2 3 4 5 6 7 8 9 $)

The all-gt and all-lstmf were proporly created too.

My list.eval and list.train were created manually because the script didn´t work well (maybe because bc command was missing). At the moment eval is composed by the last two lines of all-lstmf and train is composed by the first two lines of all-lstmf file (4 lines total).

The full make statment i am using is: make training TESSDATA=data/eng_hor MODEL_NAME=eng_hor OUTPUT_DIR=data/eng_hor GROUND_TRUTH_DIR=data/eng_hor-ground-truth PROTO_MODEL=data/eng_hor/eng_hor.traineddata RATIO_TRAIN=0.50

(eng_hor -> english horizontal) (ratio_train = 0.50 because there is only four lines and i´m trying to make a pilot test before start plotting more data)

Tesstrain issue - files.zip

I´ve tried all of the above in two ways: converting the files using dos2unix and not converting the files, both failed.

I´m uploading a zip file with the files used to perform the training just for guarantee.

TheFattestTony commented 2 years ago

$make training MODEL_NAME=eng_hor on WSL it returns listmtraining: command not found?

Seems to be related to WSL.

After your help command not found? error is gone. Thank you, my problem now is Can't encode transcription:

TheFattestTony commented 2 years ago

Think i´m going to record a video tutorial after solving this problem and finish the training, showing all the files needed to perform the training, and how to download then, etc... There is so little content about it at internet and i think the doccumentation need to be more clear about the instructions. More than 35 hours already trying to perform a successfull training.

TheFattestTony commented 2 years ago

I will close this issue. The way i found to overcome most of these issues is run everthing at WSL.

kba commented 2 years ago

run everthing at WSL.

That is probably a wise choice. Some of the problems seem to stem from trying to parse filenames with spaces in them and I think most developers don't even try to run on Windows natively AFAIK.

Sorry to hear you had such trouble with getting it to work. If you want to contribute to the documentation or indeed do a video tutorial, contributions are welcome!