tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
599 stars 178 forks source link

File not found - *.gt.txt #369

Closed jimlaloi closed 4 months ago

jimlaloi commented 4 months ago

I'm new to tesstrain, following the readme to get started. I'm getting an error that doesn't make any sense to me. I'm on Windows 10. Here's what I've done:

Now for the error. When I try to run make training, I get the following error:

File not found - *.gt.txt
File not found - *.gt.txt
    You are using make version: 4.4.1
Makefile:224: *** found no data/foo-ground-truth/*.gt.txt for data/foo/all-gt.  Stop.

I have confirmed that I do indeed have a whole bunch of .gt.txt files in data/foo-ground-truth, from the unzipped ocrd-testset.zip. I've hit a wall on my troubleshooting. What might be causing it to not find these files?

stweil commented 4 months ago

find in a Windows command line is not the same as the POSIX find, for example in a Linux shell. Training won't work in a Windows command line. I suggest to use WSL on Windows.

zdenop commented 4 months ago

WSL is a good suggestion, but POSIX find is part of the git installation - make sure that the git path(s) are at the beginning of PATH env variable.

jimlaloi commented 4 months ago

@zdenop Thanks you pointed me in the right direction. I double-checked my PATH env variable order from the command line (path), and the git path was halfway down. I wasn't sure why, because I had put it at the top of my PATH env variable in the Adv System Settings dialog. But then I realized I had put it in my User PATH, which is listed after all the paths in the System PATH env variable. Once I added my git path to the System PATH (at the top), everything started working.