tesseract-ocr / tesseract

Tesseract Open Source OCR Engine (main repository)
https://tesseract-ocr.github.io/
Apache License 2.0
59.53k stars 9.23k forks source link

Floating-point exception (SIGFPE) due to out-of-range input to asinf in Wordrec::angle_change #4242

Closed ChristianOsta closed 1 month ago

ChristianOsta commented 1 month ago

Current Behavior

The image below causes a floating-point exception (SIGFPE) under ubuntu (WSL) when using the legacy model with psm_mode = 7 due to an invalid input to the asinf function. The exception is triggered when the input to asinf is slightly out of its valid range, specifically -1.00000012. This results in a program termination with a SIGFPE error. Notably, this issue does not occur under Windows.

Backtrace: The backtrace indicates that the error originates from the tesseract::Wordrec::angle_change function: -> see "other information"

tesseract command: tesseract.exe -l eng+deu "tesseract_fail.png" stdout --tessdata-dir "" --oem 0 --psm 7

i used the legacy models for english and german from tesseract-ocr/tessdata

interestingly, when moving the single "d" in the bottom part of the image one pixel up or to the right the exception will not be thrown anymore.

I will gladly provide additional information if needed.

image to reproduce the behavior: tesseract_crash

Expected Behavior

Tesseract should handle the input gracefully without causing a floating-point exception.

Suggested Fix

No response

tesseract -v

tesseract 5.3.4 leptonica-1.83.1 libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 3.0.0) : libpng 1.6.43 : libtiff 4.6.0 : zlib 1.2.13 : libwebp 1.4.0 : libopenjp2 2.5.2 Found AVX512BW Found AVX512F Found AVX512VNNI Found AVX2 Found AVX Found FMA Found SSE4.1 Found OpenMP 201511 Found libarchive 3.7.2 zlib/1.2.13 liblzma/5.2.6 bz2lib/1.0.8 liblz4/1.9.3 libzstd/1.5.5

Operating System

No response

Other Operating System

Ubuntu inside Windows Subsystem for Linux (WSL)

Distributor ID: Ubuntu Description: Ubuntu 22.04.4 LTS Release: 22.04 Codename: jammy

uname -a

Linux 5.15.146.1-microsoft-standard-WSL2 #1 SMP Thu Jan 11 04:09:03 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Compiler

No response

CPU

No response

Virtualization / Containers

No response

Other Information

this is the output of bt: (gdb) bt

0 0x00007f66916bc552 in __GI___feraiseexcept (excepts=excepts@entry=1)

at ../sysdeps/x86_64/fpu/fraiseexcpt.c:36

1 0x00007f66916c2590 in __asinf (x=-1.00000012) at ./math/w_asinf_compat.c:34

2 __asinf (x=-1.00000012) at ./math/w_asinf_compat.c:28

3 0x00007f6691f8dd63 in tesseract::Wordrec::angle_change(tesseract::EDGEPT, tesseract::EDGEPT, tesseract::EDGEPT*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

4 0x00007f6691f8e243 in tesseract::Wordrec::pick_close_point(tesseract::EDGEPT, tesseract::EDGEPT, int*)

() from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

5 0x00007f6691f8e66b in tesseract::Wordrec::vertical_projection_point(tesseract::EDGEPT, tesseract::EDGEPT, tesseract::EDGEPT*, tesseract::EDGEPT_CLIST) ()

from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

6 0x00007f6691f935c6 in tesseract::Wordrec::try_vertical_splits(tesseract::EDGEPT*, short, tesseract::EDGEPT_CLIST, tesseract::GenericHeap<tesseract::KDPtrPairInc<float, tesseract::SEAM> >, tesseract::GenericHeap<tesseract::KDPtrPairDec<float, tesseract::SEAM> >, tesseract::SEAM*, tesseract::TBLOB) ()

from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

7 0x00007f6691f93c56 in tesseract::Wordrec::pick_good_seam(tesseract::TBLOB*) ()

from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

8 0x00007f6691f8fa43 in tesseract::Wordrec::attempt_blob_chop(tesseract::TWERD, tesseract::TBLOB, int, bool, std::vector<tesseract::SEAM, std::allocator<tesseract::SEAM> > const&) ()

from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

9 0x00007f6691f909b2 in tesseract::Wordrec::improve_one_blob(std::vector<tesseract::BLOB_CHOICE, std::allocator<tesseract::BLOB_CHOICE> > const&, std::vector<tesseract::DANGERR_INFO, std::allocator >, bool, bool, tesseract::WERD_RES, unsigned int*) ()

from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

10 0x00007f6691f90bd0 in tesseract::Wordrec::improve_by_chopping(float, tesseract::WERD_RES, tesseract::BestChoiceBundle, tesseract::BlamerBundle, tesseract::LMPainPoints, std::vector<tesseract::SegSearchPending, st--Type for more, q to quit, c to continue without paging--c

d::allocator >*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

11 0x00007f6691fa0a78 in tesseract::Wordrec::SegSearch(tesseract::WERD_RES, tesseract::BestChoiceBundle, tesseract::BlamerBundle*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

12 0x00007f6691f8f0c8 in tesseract::Wordrec::chop_word_main(tesseract::WERD_RES*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

13 0x00007f6691f8cc6d in tesseract::Wordrec::cc_recog(tesseract::WERD_RES*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

14 0x00007f6691e5f71c in tesseract::Tesseract::recog_word_recursive(tesseract::WERD_RES*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

15 0x00007f6691e5f8c4 in tesseract::Tesseract::recog_word(tesseract::WERD_RES*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

16 0x00007f6691e5cb62 in tesseract::Tesseract::tess_segment_pass_n(int, tesseract::WERD_RES*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

17 0x00007f6691e04b52 in tesseract::Tesseract::match_word_pass_n(int, tesseract::WERD_RES, tesseract::ROW, tesseract::BLOCK*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

18 0x00007f6691e04d0b in tesseract::Tesseract::classify_word_pass1(tesseract::WordData const&, tesseract::WERD_RES*, tesseract::PointerVector) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

19 0x00007f6691e0810a in tesseract::Tesseract::RetryWithLanguage(tesseract::WordData const&, void (tesseract::Tesseract::*)(tesseract::WordData const&, tesseract::WERD_RES*, tesseract::PointerVector), bool, tesseract::WERD_RES*, tesseract::PointerVector) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

20 0x00007f6691e08b22 in tesseract::Tesseract::classify_word_and_language(int, tesseract::PAGE_RES_IT, tesseract::WordData) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

21 0x00007f6691e0d41d in tesseract::Tesseract::RecogAllWordsPassN(int, tesseract::ETEXT_DESC, tesseract::PAGE_RES_IT, std::vector<tesseract::WordData, std::allocator >*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

22 0x00007f6691e0e464 in tesseract::Tesseract::recog_all_words(tesseract::PAGE_RES, tesseract::ETEXT_DESC, tesseract::TBOX const, char const, int) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

23 0x00007f6691ddff64 in tesseract::TessBaseAPI::Recognize(tesseract::ETEXT_DESC*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

24 0x00007f6691de056b in tesseract::TessBaseAPI::ProcessPage(Pix, int, char const, char const, int, tesseract::TessResultRenderer) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

25 0x00007f6691de18e1 in tesseract::TessBaseAPI::ProcessPagesInternal(char const, char const, int, tesseract::TessResultRenderer*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

26 0x00007f6691de1adf in tesseract::TessBaseAPI::ProcessPages(char const, char const, int, tesseract::TessResultRenderer*) () from /home/chris/mambaforge/envs/tess_bug/bin/../lib/libtesseract.so.5

27 0x0000556ecc08455b in main ()

stweil commented 1 month ago

Unrelated:

--psm 7 won't work for a rotated line image. That requires --psm 1 (or no argument for page segmentation mode).

stweil commented 1 month ago

@ChristianOsta, if you want you can try and review the pull request #4243 which fixes the issue.

stweil commented 1 month ago

Notably, this issue does not occur under Windows.

FP exceptions are enabled conditionally in main(). Therefore this exception is not thrown on macOS (with clang compiler) and on Windows (compiler without HAVE_FEENABLEEXCEPT).

amitdo commented 1 month ago

The fix was pushed to the main branch.