Closed nagadomi closed 3 years ago
It would be interesting to know whether Tesseract training is possible on Windows (without WSL which works). There might be more more problems to solve.
It seems to work with Anaconda PowerShell(Python env on Windows) and Tesseract5 on Windows, although it's as simple as below.
tesstrain.py
combine_tessdata
lstmtraining
One thing to be noted:
tesserocr's conda package contains text2image.exe
, unicharset_extractor.exe
, dawg2wordlist.exe
, lstmtraining.exe
and other executable files, which will be installed in ${HOME}\Anaconda3\Library\bin
. It is in a higher priority PATH on the Anaconda Console than the system Tesseract(C:\Program Files\Tesseract-OCR
).
So, when I run python src/training/tesstrain.py ...
command, it runs different version of executable file than I intended, some of the command crash.
The solution is to modify the PATH environment variable before running tesstrain.py
.
$env:path="C:\Program Files\Tesseract-OCR;$env:path"
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
On Windows,
Path.write_text
converts the newline code(\n) to CRLF, andlstmtraining
cannot handle CRLF properly, so theDeserialize header failed
error occurs. https://github.com/tesseract-ocr/tesseract/issues/2456This pull request changes
tesstrain.py
to use LF newline code when creating{lang_code}.training_files.txt
(list of lstmf files) even on Windows.