tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
599 stars 178 forks source link

Invalid File Operation #348

Closed Vitiated-dev closed 10 months ago

Vitiated-dev commented 11 months ago

Hello,

I've recently built Tesseract from source for training, and I've run into an issue I have yet to be able to solve.

Any help is greatly appreciated!

The error below is the entire contents of the nohup.out file produced by running nohup make training.

Error

Makefile:231: *** Invalid file operation: <data/foo-ground-truth/candidus_christus_1854_0030_017.gt.txt.  Stop.

I received this error with the provided sample ground-truth files and my generated ones. This issue occurs with every file in the ground truth directory.

System Information

5.4.0-84-generic #94~18.04.1-Ubuntu SMP Thu Aug 26 23:17:46 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Python Information

Python 3.6.9
 Pillow (8.4.0)

Tesseract Information

tesseract 5.3.2
 leptonica-1.75.3
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0

 Found SSE4.1
 Found OpenMP 201511
 Found libarchive 3.2.2 zlib/1.2.11 liblzma/5.2.2 bz2lib/1.0.6 liblz4/1.7.1
 Found libcurl/7.58.0 OpenSSL/1.1.1 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3
liketheflower commented 11 months ago

Had a similar issue. The error message I got is:

Makefile:231: *** Invalid file operation: <data/foo-ground-truth/menzel_literatur01_1828_0238_006.gt.txt.  Stop.

I can find that file from the path:

ls data/foo-ground-truth/menzel_literatur01_1828_0238_006.gt.txt
data/foo-ground-truth/menzel_literatur01_1828_0238_006.gt.txt

The file content looks good.

image
liketheflower commented 11 months ago

My issue was fixed. The error was caused by the make version, we need at least make version 4.2, I had 4.1. After upgrading the make to 4.2, the problem is solved. Here is the command I used to upgrade the make version.

Regrading the requirements of the make version, see here.

stweil commented 11 months ago

So Debian releases starting from Debian Buster and later are not affected because they provide a sufficiently new version of make. Version 4.2 was released in 2016. But macOS comes with GNU Make 3.81 and won't work.

Makefile should check MAKE_VERSION and show a hint if the version is too old. And ideally it should work on macOS with the old version of make, too.

liketheflower commented 11 months ago

So Debian releases starting from Debian Buster and later are not affected because they provide a sufficiently new version of make. Version 4.2 was released in 2016. But macOS comes with GNU Make 3.81 and won't work.

Makefile should check MAKE_VERSION and show a hint if the version is too old. And ideally it should work on macOS with the old version of make, too.

Very true.

zdenop commented 11 months ago

Something like this?

diff --git a/Makefile b/Makefile
index cff6d2b..5fa7f88 100644
--- a/Makefile
+++ b/Makefile
@@ -121,7 +121,7 @@ endif

 # BEGIN-EVAL makefile-parser --make-help Makefile

-help:
+help: default
        @echo ""
        @echo "  Targets"
        @echo ""
@@ -171,9 +171,17 @@ help:

 # END-EVAL

+default:
+ifeq (4.2, $(firstword $(sort $(MAKE_VERSION) 4.2)))
+       @echo "    You are using make version: $(MAKE_VERSION)"
+else
+       $(error This version of GNU Make is too low ($(MAKE_VERSION)). Check your path, or upgrade to 4.2 or newer.)
+endif
+
 .PRECIOUS: $(LAST_CHECKPOINT)

-.PHONY: clean help leptonica lists proto-model tesseract tesseract-langs tesseract-langdata training unicharset charfreq
+.PHONY: default clean help leptonica lists proto-model tesseract tesseract-langs tesseract-langdata training unicharset charfreq

 ALL_FILES = $(and $(wildcard $(GROUND_TRUTH_DIR)),$(shell find -L $(GROUND_TRUTH_DIR) -name '*.gt.txt'))
 unexport ALL_FILES # prevent adding this to envp in recipes (which can cause E2BIG if too long; cf. make #44853)
@@ -225,7 +233,7 @@ $(OUTPUT_DIR)/unicharset: $(ALL_GT) | $(OUTPUT_DIR)
 endif

 # Start training
-training: $(OUTPUT_DIR).traineddata
+training: default $(OUTPUT_DIR).traineddata

 $(ALL_GT): $(ALL_FILES) | $(OUTPUT_DIR)
        $(if $^,,$(error found no $(GROUND_TRUTH_DIR)/*.gt.txt for $@))
@@ -422,4 +430,4 @@ clean-output:
        rm -rf $(OUTPUT_DIR)

 # Clean all generated files
-clean: clean-box clean-lstmf clean-output
+clean: default clean-box clean-lstmf clean-output
Vitiated-dev commented 11 months ago

Looks like it was the make version - upgrading to 4.2 solved the issue and training completes as expected. Thanks @liketheflower!

zdenop commented 10 months ago

make version check implemented 25b8508f06365de0fe004fa09871e9bda1b56694