Open marcreichman-pfi opened 1 week ago
Here is the image for this one, sorry.
There is a heap-use-after-free before the assertion:
Estimating resolution as 261
Detected 12 diacritics
=================================================================
==31201==ERROR: AddressSanitizer: heap-use-after-free on address 0x6080000034b8 at pc 0x55a73474bd12 bp 0x7fffbe0cdab0 sp 0x7fffbe0cdaa8
READ of size 8 at 0x6080000034b8 thread T0
#0 0x55a73474bd11 in std::__cxx1998::_Base_bitset<1ul>::_M_getword(unsigned long) const /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bitset:415:16
#1 0x55a73474bc82 in std::__cxx1998::bitset<16ul>::_Unchecked_test(unsigned long) const /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bitset:1066:24
#2 0x55a73474bc00 in std::__cxx1998::bitset<16ul>::operator[](unsigned long) const /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bitset:1168:16
#3 0x55a73474bba2 in std::__debug::bitset<16ul>::operator[](unsigned long) const /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/debug/bitset:282:16
#4 0x55a73474b2df in tesseract::WERD::flag(tesseract::WERD_FLAGS) const /tesseract/build/../src/ccstruct/werd.h:129:12
#5 0x55a7349c0280 in tesseract::Tesseract::recog_all_words(tesseract::PAGE_RES*, tesseract::ETEXT_DESC*, tesseract::TBOX const*, char const*, int) /tesseract/build/../src/ccmain/control.cpp:350:37
#6 0x55a7346a24af in tesseract::TessBaseAPI::Recognize(tesseract::ETEXT_DESC*) /tesseract/build/../src/api/baseapi.cpp:833:21
#7 0x55a7346a4b99 in tesseract::TessBaseAPI::ProcessPage(Pix*, int, char const*, char const*, int, tesseract::TessResultRenderer*) /tesseract/build/../src/api/baseapi.cpp:1218:14
#8 0x55a7346a92b8 in tesseract::TessBaseAPI::ProcessPagesInternal(char const*, char const*, int, tesseract::TessResultRenderer*) /tesseract/build/../src/api/baseapi.cpp:1181:16
#9 0x55a7346a61f1 in tesseract::TessBaseAPI::ProcessPages(char const*, char const*, int, tesseract::TessResultRenderer*) /tesseract/build/../src/api/baseapi.cpp:998:17
#10 0x55a7346262f2 in main /tesseract/build/../src/tesseract.cpp:867:24
#11 0x7f8a62f23249 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
#12 0x7f8a62f23304 in __libc_start_main csu/../csu/libc-start.c:360:3
#13 0x55a734563450 in _start (/tesseract/build/tesseract+0x17d9450) (BuildId: 76aacbbd0f98892a9872e3f978f3ed72519cf4ee)
0x6080000034b8 is located 24 bytes inside of 96-byte region [0x6080000034a0,0x608000003500)
freed by thread T0 here:
#0 0x55a7346218cd in operator delete(void*) (/tesseract/build/tesseract+0x18978cd) (BuildId: 76aacbbd0f98892a9872e3f978f3ed72519cf4ee)
#1 0x55a734fb773e in tesseract::WERD_RES::Clear() /tesseract/build/../src/ccstruct/pageres.cpp:1130:5
#2 0x55a734fcb438 in tesseract::WERD_RES::~WERD_RES() /tesseract/build/../src/ccstruct/pageres.cpp:1125:3
#3 0x55a734fd0bee in tesseract::PAGE_RES_IT::ReplaceCurrentWord(tesseract::PointerVector<tesseract::WERD_RES>*) /tesseract/build/../src/ccstruct/pageres.cpp:1483:3
#4 0x55a7349b840b in tesseract::Tesseract::classify_word_and_language(int, tesseract::PAGE_RES_IT*, tesseract::WordData*) /tesseract/build/../src/ccmain/control.cpp:1367:14
#5 0x55a7349bbe84 in tesseract::Tesseract::RecogAllWordsPassN(int, tesseract::ETEXT_DESC*, tesseract::PAGE_RES_IT*, std::__debug::vector<tesseract::WordData, std::allocator<tesseract::WordData> >*) /tesseract/build/../src/ccmain/control.cpp:255:5
#6 0x55a7349c0125 in tesseract::Tesseract::recog_all_words(tesseract::PAGE_RES*, tesseract::ETEXT_DESC*, tesseract::TBOX const*, char const*, int) /tesseract/build/../src/ccmain/control.cpp:345:10
#7 0x55a7346a24af in tesseract::TessBaseAPI::Recognize(tesseract::ETEXT_DESC*) /tesseract/build/../src/api/baseapi.cpp:833:21
#8 0x55a7346a4b99 in tesseract::TessBaseAPI::ProcessPage(Pix*, int, char const*, char const*, int, tesseract::TessResultRenderer*) /tesseract/build/../src/api/baseapi.cpp:1218:14
#9 0x55a7346a92b8 in tesseract::TessBaseAPI::ProcessPagesInternal(char const*, char const*, int, tesseract::TessResultRenderer*) /tesseract/build/../src/api/baseapi.cpp:1181:16
#10 0x55a7346a61f1 in tesseract::TessBaseAPI::ProcessPages(char const*, char const*, int, tesseract::TessResultRenderer*) /tesseract/build/../src/api/baseapi.cpp:998:17
#11 0x55a7346262f2 in main /tesseract/build/../src/tesseract.cpp:867:24
#12 0x7f8a62f23249 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
previously allocated by thread T0 here:
#0 0x55a73462106d in operator new(unsigned long) (/tesseract/build/tesseract+0x189706d) (BuildId: 76aacbbd0f98892a9872e3f978f3ed72519cf4ee)
#1 0x55a734fb5302 in tesseract::ROW_RES::ROW_RES(bool, tesseract::ROW*) /tesseract/build/../src/ccstruct/pageres.cpp:171:21
#2 0x55a734fb3c97 in tesseract::BLOCK_RES::BLOCK_RES(bool, tesseract::BLOCK*) /tesseract/build/../src/ccstruct/pageres.cpp:109:31
#3 0x55a734fb32aa in tesseract::PAGE_RES::PAGE_RES(bool, tesseract::BLOCK_LIST*, tesseract::WERD_CHOICE**) /tesseract/build/../src/ccstruct/pageres.cpp:84:13
#4 0x55a73469f93e in tesseract::TessBaseAPI::Recognize(tesseract::ETEXT_DESC*) /tesseract/build/../src/api/baseapi.cpp:783:13
#5 0x55a7346a4b99 in tesseract::TessBaseAPI::ProcessPage(Pix*, int, char const*, char const*, int, tesseract::TessResultRenderer*) /tesseract/build/../src/api/baseapi.cpp:1218:14
#6 0x55a7346a92b8 in tesseract::TessBaseAPI::ProcessPagesInternal(char const*, char const*, int, tesseract::TessResultRenderer*) /tesseract/build/../src/api/baseapi.cpp:1181:16
#7 0x55a7346a61f1 in tesseract::TessBaseAPI::ProcessPages(char const*, char const*, int, tesseract::TessResultRenderer*) /tesseract/build/../src/api/baseapi.cpp:998:17
#8 0x55a7346262f2 in main /tesseract/build/../src/tesseract.cpp:867:24
#9 0x7f8a62f23249 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
SUMMARY: AddressSanitizer: heap-use-after-free /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/bitset:415:16 in std::__cxx1998::_Base_bitset<1ul>::_M_getword(unsigned long) const
Shadow bytes around the buggy address:
0x0c107fff8640: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 fa
0x0c107fff8650: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
0x0c107fff8660: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 05
0x0c107fff8670: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
0x0c107fff8680: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 06
=>0x0c107fff8690: fa fa fa fa fd fd fd[fd]fd fd fd fd fd fd fd fd
0x0c107fff86a0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
0x0c107fff86b0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
0x0c107fff86c0: fa fa fa fa fd fd fd fd fd fd fd fd fd fd fd fd
0x0c107fff86d0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 02
0x0c107fff86e0: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==31201==ABORTING
The current code uses random values to add noise outside of the image. Using a constant instead of the random values might work better (still to try with the other cases):
diff --git a/src/lstm/networkio.cpp b/src/lstm/networkio.cpp
index 3cb068c6..83347260 100644
--- a/src/lstm/networkio.cpp
+++ b/src/lstm/networkio.cpp
@@ -417,7 +417,7 @@ void NetworkIO::Randomize(int t, int offset, int num_features, TRand *randomizer
if (int_mode_) {
int8_t *line = i_[t] + offset;
for (int i = 0; i < num_features; ++i) {
- line[i] = IntCastRounded(randomizer->SignedRand(INT8_MAX));
+ line[i] = 0;
}
} else {
// float mode.
Still it is better to understand what is wrong with using lists. I guess lists usage is incorrect somewhere.
Or more in general - fix all other issues around random values and crashes they spotlight.
Current Behavior
This is in the recent
main
(9f17a3fd
) I receive a SIGABRT in Release (SIGILL in Debug) with the eng and chi_tra langages. Both are fast and official.Expected Behavior
No sig abort
Suggested Fix
No response
tesseract -v
Operating System
Ubuntu 22.04 Jammy
Other Operating System
WSL
uname -a
Linux hostname 5.10.16.3-microsoft-standard-WSL2 #1 SMP Fri Apr 2 22:23:49 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Compiler
GCC 11.4
CPU
Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz
Virtualization / Containers
No response
Other Information
I'm sure this is related to the random generator-covered series of issues (#4361 #4146 #4148 #4270). This is also reproducible in 5.5.0, unlike #4361 which worked on in 5.5.0.