Segfault in lstmtraining when training the demo data

inductiveload commented 3 years ago

Arch Linux,

tesseract 5.0.0-alpha-20210401-158-ge1761
 leptonica-1.81.0
  libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.1.0) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE4.1
 Found OpenMP 201511
 Found libarchive 3.5.1 zlib/1.2.11 liblzma/5.2.5 bz2lib/1.0.8 liblz4/1.9.3 libzstd/1.4.5
 Found libcurl/7.77.0 OpenSSL/1.1.1k zlib/1.2.11 zstd/1.5.0 libidn2/2.3.1 libpsl/0.21.1 (+libidn2/2.3.0) libssh2/1.9.0 nghttp2/1.43.0

tesstrain 0e8151472ca034ee3366682d6829802ee1d9455e

What I did:

Cloned tessdata_best to ~/src
unzip ocrd-testset.zip -d data/ocrd-ground-truth
make training MODEL_NAME=ocrd START_MODEL=frk TESSDATA=~/src/tessdata_best MAX_ITERATIONS=10000

Output:

lstmtraining \
  --debug_interval 0 \
  --traineddata data/ocrd/ocrd.traineddata \
  --old_traineddata /home/john/src/tessdata_best/frk.traineddata \
  --continue_from data/frk/ocrd.lstm \
  --learning_rate 0.0001 \
  --model_output data/ocrd/checkpoints/ocrd \
  --train_listfile data/ocrd/list.train \
  --eval_listfile data/ocrd/list.eval \
  --max_iterations 10000 \
  --target_error_rate 0.01
Loaded file data/frk/ocrd.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 99 to 101!
Num (Extended) outputs,weights in Series:
  1,48,0,1:1, 0
Num (Extended) outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0
  TxyLfys64:64, 20736
  Lfx96:96, 61824
  RxLrx96:96, 74112
  Lfx384:384, 738816
  Fc101:101, 0
Total weights = 895648
Previous null char=98 mapped to 100
Continuing from data/frk/ocrd.lstm
make: *** [Makefile:278: data/ocrd/checkpoints/ocrd_checkpoint] Segmentation fault (core dumped

GDB of crashed lstmtraining:

0x00007ffff7eaa8c9 in tesseract::NetworkIO::Transpose(tesseract::TransposedArray*) const () from /usr/lib/libtesseract.so.5
(gdb) bt
#0  0x00007ffff7eaa8c9 in tesseract::NetworkIO::Transpose(tesseract::TransposedArray*) const () from /usr/lib/libtesseract.so.5
#1  0x00007ffff7ea0a36 in tesseract::LSTM::Backward(bool, tesseract::NetworkIO const&, tesseract::NetworkScratch*, tesseract::NetworkIO*) () from /usr/lib/libtesseract.so.5
#2  0x00007ffff7ebcb8f in tesseract::Series::Backward(bool, tesseract::NetworkIO const&, tesseract::NetworkScratch*, tesseract::NetworkIO*) () from /usr/lib/libtesseract.so.5
#3  0x000055555556f388 in ?? ()
#4  0x0000555555560f87 in ?? ()
#5  0x00007ffff7429b25 in __libc_start_main () from /usr/lib/libc.so.6
#6  0x00005555555619fe in ?? ()

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

wrznr commented 3 years ago

Cannot repoduce the problem. Could you please make a test without a start model? I.e. train from scratch?

Codethrill-20 commented 3 years ago

Hi there, my problem is quite similar. The execution without start_model works, but when adding a start model I get a segmentation fault:

lstmtraining \
  --debug_interval 0 \
  --traineddata data/pdf/pdf.traineddata \
  --old_traineddata /usr/share/tesseract-ocr/4.00/tessdata//eng.traineddata \
  --continue_from data/eng/pdf.lstm \
  --learning_rate 0.0001 \
  --model_output data/pdf/checkpoints/pdf \
  --train_listfile data/pdf/list.train \
  --eval_listfile data/pdf/list.eval \
  --max_iterations 10000 \
  --target_error_rate 0.01
Loaded file data/eng/pdf.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 111 to 111!
Num (Extended) outputs,weights in Series:
  1,36,0,1:1, 0
Num (Extended) outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0
  Lfys48:48, 12480
  Lfx96:96, 55680
  Lrx96:96, 74112
  Lfx192:192, 221952
  Fc111:111, 0
Total weights = 364384
Previous null char=110 mapped to 110
Continuing from data/eng/pdf.lstm
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/00efa1bb61fb5e2acbac526cae15db47_22.lstmf
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/00ed3d1c5efa45cb1f159b2aea364c06_13.lstmf
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/00e9abbf6ae0316b26564489043309e7_28.lstmf
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/00d93774feb260161c699826659335eb_26.lstmf
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/00d93774feb260161c699826659335eb_31.lstmf
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/00db3bce204043f8ae6093acb10f3421_15.lstmf
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/00d93774feb260161c699826659335eb_24.lstmf
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/00cf516d14934c8cc4aced3892e8023d_9.lstmf
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/00d9bc8920fad718d800d8e03e5db4a1_26.lstmf
Loaded 1/1 lines (1-1) of document data/pdf-ground-truth/0a0a3b164fb469e52d9532de17a0ca6d_15.lstmf
make: *** [Makefile:278: data/pdf/checkpoints/pdf_checkpoint] Segmentation fault

I use:

tesseract 4.1.1:
tesseract 4.1.1
 leptonica-1.79.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.3) : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.1
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
 Found libarchive 3.4.0 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.8 liblz4/1.9.2 libzstd/1.4.4

installed on Ubuntu 20.04.3 with apt install tesseract-ocr tesseract-ocr-eng

and the command: make training MODEL_NAME='pdf' START_MODEL='eng' CORES=8 PSM=6 TESSDATA='/usr/share/tesseract-ocr/4.00/tessdata/'

Same Problem with the test-set 'foo':

lstmtraining \
  --debug_interval 0 \
  --traineddata data/foo/foo.traineddata \
  --old_traineddata /usr/share/tesseract-ocr/4.00/tessdata//eng.traineddata \
  --continue_from data/eng/foo.lstm \
  --learning_rate 0.0001 \
  --model_output data/foo/checkpoints/foo \
  --train_listfile data/foo/list.train \
  --eval_listfile data/foo/list.eval \
  --max_iterations 10000 \
  --target_error_rate 0.01
Loaded file data/eng/foo.lstm, unpacking...
Warning: LSTMTrainer deserialized an LSTMRecognizer!
Code range changed from 111 to 119!
Num (Extended) outputs,weights in Series:
  1,36,0,1:1, 0
Num (Extended) outputs,weights in Series:
  C3,3:9, 0
  Ft16:16, 160
Total weights = 160
  [C3,3Ft16]:16, 160
  Mp3,3:16, 0
  Lfys48:48, 12480
  Lfx96:96, 55680
  Lrx96:96, 74112
  Lfx192:192, 221952
  Fc119:119, 0
Total weights = 364384
Previous null char=110 mapped to 118
Continuing from data/eng/foo.lstm
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/frapan_bittersuess_1891_0103_007.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/clauren_liebe_1827_0105_016.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/lenau_gedichte_1832_0225_006.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/hoffmann_elixiere01_1815_0173_012.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/andreas_fenitschka_1898_0066_007.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/poersch_gewerkschaftsbewegung_1897_0032_045.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/saar_novellen_1877_0283_020.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/raschdorff_hochbau_1880_0025_016.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/gutzkow_wally_1835_0154_008.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/fiedler_kuenstlerische_1887_0135_015.lstmf
Loaded 1/1 lines (1-1) of document data/foo-ground-truth/poersch_gewerkschaftsbewegung_1897_0020_021.lstmf
make: *** [Makefile:278: data/foo/checkpoints/foo_checkpoint] Segmentation fault

Command: make training START_MODEL='eng' CORE=8 TESSDATA='/usr/share/tesseract-ocr/4.00/tessdata/'

stefan6419846 commented 2 years ago

I had the same problem when trying to train with the system-provided start model. After reading https://github.com/tesseract-ocr/tesseract/issues/1573, I downloaded the corresponding tessdata_best model and everything worked fine.

tesseract-ocr / tesstrain

Segfault in lstmtraining when training the demo data #269