tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
599 stars 178 forks source link

FileNotFoundError: [Errno 2] No such file or directory: 'data/namaq/all-lstmf' #328

Closed ShahadAlkhalifa closed 1 year ago

ShahadAlkhalifa commented 1 year ago

I’m using tesseract to train it on my data. However, I'm receiving this error after running make training MODEL_NAME=name-of-the-resulting-model. It seems that it's failing to create all-lstmf and all-gt files. Even though, it successfully created the .box and .lstmf files in my data. How can I resolve this issue? Thanks.

zdenop commented 1 year ago

post the whole log, and all steps to reproduce the problem.

ShahadAlkhalifa commented 1 year ago

This is the whole log: unicharset_extractor --output_unicharset "data/namaq/unicharset" --norm_mode 2 "data/namaq/all-gt" Failed to read data from: data/namaq/all-gt Wrote unicharset file data/namaq/unicharset python3 shuffle.py 0 "data/namaq/all-lstmf" Traceback (most recent call last): File "/Users/shahadalkhalifa/tesstrain/shuffle.py", line 24, in <module> fd0 = open(sys.argv[2], 'r') FileNotFoundError: [Errno 2] No such file or directory: 'data/namaq/all-lstmf' make: *** [data/namaq/all-lstmf] Error 1

Note: I named my model "namaq" and both all-gt and all-lstmf folders were not created in my model's directory

zdenop commented 1 year ago

Sorry but that is not the whole log: all-lstmf does not exist because of an error from previous commands.

ShahadAlkhalifa commented 1 year ago

Sorry but that is not the whole log: all-lstmf does not exist because of an error from previous commands.

After the command make tesseract-langdata I ran make training MODEL_NAME=namaq command and the following is the output in the terminal:

tesseract "data/namaq-ground-truth/img84.png" data/namaq-ground-truth/img84 --psm 13 lstm.train

Please note that this is only part of the output since I'm training it on a large amount of files and more box and lstmf files were created

z160896 commented 1 year ago

I am having the same problem.

z160896 commented 1 year ago

I put all .gt.txt, png files in hwmodel-ground-truth dir. then run make training MODEL_NAME=hwmodel command. it generated .box file in hwmodel-ground-truth dir. Then error: python3 shuffle.py 0 "data/hwmodel/all-lstmf" Traceback (most recent call last): File "shuffle.py", line 24, in fd0 = open(sys.argv[2], 'r') FileNotFoundError: [Errno 2] No such file or directory: 'data/hwmodel/all-lstmf'

ShahadAlkhalifa commented 1 year ago

@zdenop It seems like a bug because the folders were not created. I tried creating them manually and ran make training MODEL_NAME=namq again and got the following output:

unicharset_extractor --output_unicharset "data/namaq/unicharset" --norm_mode 2 "data/namaq/all-gt" Failed to read data from: data/namaq/all-gt Wrote unicharset file data/namaq/unicharset wc: stdin: read: Is a directory

Parse error: bad token

:1 Parse error: bad expression :1 + head -n '' data/namaq/all-lstmf head: illegal line count -- + tail -n '' data/namaq/all-lstmf combine_lang_model \ --input_unicharset data/namaq/unicharset \ --script_dir data/langdata \ --numbers data/namaq/namaq.numbers \ --puncs data/namaq/namaq.punc \ --words data/namaq/namaq.wordlist \ --output_dir data \ \ --lang namaq Failed to read data from: data/namaq/namaq.wordlist Failed to read data from: data/namaq/namaq.punc Failed to read data from: data/namaq/namaq.numbers Loaded unicharset of size 3 from file data/namaq/unicharset Setting unichar properties Setting script properties Failed to load script unicharset from:data/langdata/Latin.unicharset Config file is optional, continuing... Failed to read data from: data/langdata/namaq/namaq.config Null char=2 lstmtraining \ --debug_interval 0 \ --traineddata data/namaq/namaq.traineddata \ --learning_rate 0.002 \ --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx192 O1c`head -n1 data/namaq/unicharset`]" \ --model_output data/namaq/checkpoints/namaq \ --train_listfile data/namaq/list.train \ --eval_listfile data/namaq/list.eval \ --max_iterations 10000 \ --target_error_rate 0.01 Failed to load list of training filenames from data/namaq/list.train make: *** [data/namaq/checkpoints/namaq_checkpoint] Error 1 @z160896 If you found a way to solve the issue please let me know. Thanks
zdenop commented 1 year ago

all-lstmf is not a folder but a file. Do not try to fix something you do not understand. I wrote provide all steps to reproduce the problem - you just picking some commands that fail (of course) because you ignored previous errors. If you are really interested in support follow the instructions.

  1. Read instructions carefully.
  2. Start from scratch (remove all folders and files that do not originate from this repository)
  3. Log the whole training process are read all messages you see.
  4. Fix each error message, and pay attention to warnings (there are warnings for optional files you probably do not need at this stage)
zipizigi commented 1 year ago

I've been having a long time problem with the same issue. The cause of the problem was simple. The version of make is old

$ make --version
GNU Make 3.81

all-lstmf and all-gt files cannot be generated in this version.

Upgrade make. For Mac brew install make

$ make --version
GNU Make 4.4
$ make training

### for mac
$ gmake --version
GNU Make 4.4

$ gmake training

Now there is no error.

ShahadAlkhalifa commented 1 year ago

@zipizigi Yes this exactly solved the error! Thank you very much :)

z160896 commented 1 year ago

Now I have this error when run gmake leptonica tesseract. any suggestions? thanks

inflating: tesseract-5.3.0/unittest/validator_test.cc
cd tesseract-5.3.0 && \ sh autogen.sh && \ PKG_CONFIG_PATH="/Volumes/fast/git/tesstrain/usr/lib/pkgconfig" \ ./configure --prefix=/Volumes/fast/git/tesstrain/usr && \ LDFLAGS="-L/Volumes/fast/git/tesstrain/usr/lib"\ make -j4 install && \ LDFLAGS="-L/Volumes/fast/git/tesstrain/usr/lib"\ make -j4 training-install && \ date > "tesseract.built" Running aclocal Running /opt/local/bin/glibtoolize glibtoolize: putting auxiliary files in AC_CONFIG_AUX_DIR, 'config'. glibtoolize: copying file 'config/ltmain.sh' glibtoolize: putting macros in AC_CONFIG_MACRO_DIRS, 'm4'. glibtoolize: copying file 'm4/libtool.m4' glibtoolize: copying file 'm4/ltoptions.m4' glibtoolize: copying file 'm4/ltsugar.m4' glibtoolize: copying file 'm4/ltversion.m4' glibtoolize: copying file 'm4/lt~obsolete.m4' Running aclocal Running autoconf Missing pkg-config. Check the build requirements.

Something went wrong, bailing out!

gmake: *** [Makefile:383: tesseract.built] Error 1

zdenop commented 1 year ago

@zipizigi : can you please provide full logs for training with make 3.81 and 4.4? e.g.:

unzip -qq -d data/foo-ground-truth ocrd-testset.zip
make training 2>&1 | tee training.log
z160896 commented 1 year ago

@zdenop: in the autogen.sh script:

if grep -q PKG_CHECK_MODULES configure; then

The generated configure is invalid because pkg-config is unavailable.

rm configure echo "Missing pkg-config. Check the build requirements." bail_out fi

do you know where is PKG_CHECK_MODULES defined? I installed pkg-config: tesseract % pkg-config --version 0.29.2

how can I fix this problem: Missing pkg-config. Check the build requirements.

Something went wrong, bailing out!

gmake: *** [Makefile:383: tesseract.built] Error 1

thanks

zdenop commented 1 year ago

@z160896 : please stop with this! First of all - Stick to the original topic. One problem = one issue. Next: Issue tracker for solving bugs/errors in code. Use tesseract user forum for asking support.