tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
62.68k stars 9.54k forks source link

src/training/tesstrain_utils.sh ERROR: Program tesseract failed. Abort. #2631

Open nathan-guo opened 5 years ago

nathan-guo commented 5 years ago

Ubuntu 18.04 tesseract 4.1.0 leptonica-1.75.3 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0

Found AVX512BW Found AVX512F Found AVX2 Found AVX Found SSE

==================================================

nohup \
src/training/tesstrain.sh \
--fonts_dir /usr/share/fonts \
--lang chi_tra --linedata_only \
--fontlist "Microsoft JhengHei" \
--save_box_tiff --noextract_font_properties \
--langdata_dir ~/langdata_lstm --tessdata_dir ~/tesseract/tessdata \
--wordlist ~/langdata_lstm/chi_tra/chi_tra.wordlist \
--output_dir ~/tesstutorial/chitrain \
>~/tesstrain.sh.out 2>&1 &

================================================================= Loaded 93944/93944 lines (1-93944) of document /tmp/chi_tra-2019-08-28.CSP/chi_tra.Microsoft_JhengHei.exp0.lstmf src/training/tesstrain_utils.sh: line 72: 23573 Segmentation fault "${cmd}" "$@" 2>&1 23574 Done | tee -a ${LOG_FILE} ERROR: Program tesseract failed. Abort.

thank U sooo much.

nathan-guo commented 5 years ago

And this operation is so slow, how to open multiprocess?

Thanks

stweil commented 5 years ago

It is slow, and there cannot be done much currently to get it faster. To be really fast, we would need a Tesseract with GPU support. The current CPU based training could be made faster by using float instead of double, but I am afraid that @noahmetzger does not have the time to implement that.

stweil commented 5 years ago

The crash is most probably a known problem. Run ulimit -c unlimited before running the training. Then Linux will create a core dump for the segmentation fault, and you can examine that to get more information.

nathan-guo commented 5 years ago

Thank you so much.

xuehuiareafred commented 5 years ago

@nathan-guo Is the problem solved?

stweil commented 5 years ago

The crash problem is still not solved, so it can occur.

nathan-guo commented 4 years ago

@nathan-guo Is the problem solved?

The crash problem is still not solved

amitdo commented 3 years ago

We removed the training scripts from this repo.

What was the source of the crash? A bug in the bash script itself or in Tesseract C++ code. If it was the latter - do we have another open issue with the same bug?