Open nam-leduc opened 8 years ago
Please provide also input files (test1.tif and config.txt)
Hi zdenop, thanks for your quick response. I would like attacht 2 files config.txt
Following tif file, I can not upload to this comment, therefore I upload to my repository. You can access to following link to get tif file. https://github.com/nam-leduc/positioning/blob/master/test1.tif
Best regards, Le Duc. Nam
What OS are you using? Did you try to install tesseract (as I see you from screenshot you are using not installed tesseract) and than use tesseract? Do you have more versions of tesseract installed?
I can reproduce this issue.
tesseract phototest.tif phototest config.txt
Tesseract Open Source OCR Engine v3.05.00dev-266-gb1c1382 with Leptonica
Page 1
Segmentation fault (core dumped)
But this one works...
tesseract phototest.tif phototest -c classify_enable_learning=0 -c classify_enable_adaptive_matcher=0
Tesseract Open Source OCR Engine v3.05.00dev-266-gb1c1382 with Leptonica
Page 1
Warning in pixReadMemTiff: tiff page 1 not found
gdb tesseract
(gdb) run phototest.tif phototest config.txt
Starting program: /usr/local/bin/tesseract phototest.tif phototest config.txt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Tesseract Open Source OCR Engine v3.05.00dev-266-gb1c1382 with Leptonica
Page 1
Program received signal SIGSEGV, Segmentation fault.
tesseract::Tesseract::recog_all_words (this=0x808c00, page_res=0x81cb90,
monitor=monitor@entry=0x0, target_word_box=target_word_box@entry=0x0,
word_config=word_config@entry=0x0, dopasses=dopasses@entry=0)
at control.cpp:320
320 } else if (!AdaptiveClassifierIsEmpty()) {
(gdb) backtrace
#0 tesseract::Tesseract::recog_all_words (this=0x808c00,
page_res=0x81cb90, monitor=monitor@entry=0x0,
target_word_box=target_word_box@entry=0x0,
word_config=word_config@entry=0x0, dopasses=dopasses@entry=0)
at control.cpp:320
#1 0x00007ffff769929d in tesseract::TessBaseAPI::Recognize (
this=this@entry=0x7fffffffdce0, monitor=0x0) at baseapi.cpp:902
#2 0x00007ffff76994e4 in tesseract::TessBaseAPI::ProcessPage (
this=this@entry=0x7fffffffdce0, pix=0x83f110,
page_index=page_index@entry=0,
filename=filename@entry=0x7fffffffe257 "phototest.tif",
retry_config=retry_config@entry=0x0,
timeout_millisec=timeout_millisec@entry=0, renderer=renderer@entry=
0x81cb50) at baseapi.cpp:1231
#3 0x00007ffff7699a5c in tesseract::TessBaseAPI::ProcessPagesMultipageTiff (this=this@entry=0x7fffffffdce0, data=data@entry=0xdde558 "II*",
size=38668, filename=filename@entry=0x7fffffffe257 "phototest.tif",
retry_config=retry_config@entry=0x0,
timeout_millisec=timeout_millisec@entry=0,
renderer=renderer@entry=0x81cb50, tessedit_page_number=-1)
at baseapi.cpp:1064
#4 0x00007ffff769a0c3 in tesseract::TessBaseAPI::ProcessPagesInternal (
this=this@entry=0x7fffffffdce0, filename=<optimized out>,
---Type <return> to continue, or q <return> to quit---
retry_config=retry_config@entry=0x0,
timeout_millisec=timeout_millisec@entry=0, renderer=0x81cb50)
at baseapi.cpp:1183
#5 0x00007ffff769a2f0 in tesseract::TessBaseAPI::ProcessPages (
this=this@entry=0x7fffffffdce0, filename=<optimized out>,
retry_config=retry_config@entry=0x0,
timeout_millisec=timeout_millisec@entry=0, renderer=<optimized out>)
at baseapi.cpp:1081
#6 0x0000000000401f2a in main (argc=<optimized out>, argv=0x7fffffffde78)
at tesseractmain.cpp:448
But I can not ;-):
tesseract test1.tif test1.tif config.txt
Tesseract Open Source OCR Engine v3.05.00dev with Leptonica
Page 1
Page 2
neither on linux (3.05.00dev-266-gb1c1382) or windows 7 (tesseract 3.04.01)
Use my config.txt ...
He attached his config.txt with only 1 line, but said:
But when I using tesseract with that options:
classify_enable_learning 0 classify_enable_adaptive_matcher 0
My system is Ubuntu 14.04.
Thanks! Now I am able to reproduce it (crash with config file and no crash with "-c").
classify_enable_adaptive_matcher 0
in the config file is causing the crash,
not classify_enable_learning 0
.
Updated file: config.txt
Hi @amitdo and @zdenop,
I'm sorry, I try with other config options for checking what option make crash, but I forget recovering to original config file.
classify_enable_learning 0
classify_enable_adaptive_matcher 0
It doesn't look to me like classify_enable_adaptive_matcher=0
is really supported any more. A bunch of the new code that's been added isn't conditionalized to check it.
The reason that it doesn't crash when the config variable is set on the command line is because that's done after the recognizer is initialized, so the necessary data structure has been created.
Even 3.03 crashes with this config file.
The reason that it doesn't crash when the config variable is set on the command line is because that's done after the recognizer is initialized, so the necessary data structure has been created.
Can you elaborate on this?
The config file is processed in the Init call here:
https://github.com/tesseract-ocr/tesseract/blob/master/api/tesseractmain.cpp#L372
while the command line config variables are processed in the call to SetVariablesFromCLArgs here:
https://github.com/tesseract-ocr/tesseract/blob/master/api/tesseractmain.cpp#L379
after the adaptive matcher has already been set up.
Even though the command line case doesn't crash, it is still using the adaptive matcher because the code that references it isn't guarded by the necessary config variable.
@tfmorris @amitdo : beside this issues this behaviour should be documented: option "-c" can not be used for init only parameters. Or do we change of parsing of "-c" params?
I think we should print a warning if someone try to set an init parameter using '-c var=val' in the command line. The relevant function is SetParam
in ccutil/params.cpp
.
Good suggestions, but neither is relevant here because classify_enable_adaptive_matcher
isn't an init only parameter.
The issue is that the code has evolved so that classify_enable_adaptive_matcher=0
is no longer supported. There are sections of code which don't check this config variable and which assume that the adaptive matcher is correctly initialized. We can either drop the config variable or fix the code so that the variable protects everything that needs to be protected. I don't know how much work that'll be, but it's more than just this one place, because I fixed it and it just died somewhere else. No idea how many places there are to fix or whether it makes sense from @theraysmith's point of view to continue supporting this case.
In my opinion, the current order of evaluation (config files, then command line) is correct because it allows the config file to be overridden by the command line.
About classify_enable_adaptive_matcher
- Ray should handle it.
But the FAQ should be fixed.
Is this still an issue with latest Tesseract? We'd like to know that before releasing Tesseract 4.0.0.
Dear @stweil ,
I tested with latest version of tesseract and this issue still happened.
Tesseract Version:
tesseract --version
tesseract 5.0.0-alpha-777-g162f3
leptonica-1.75.3
libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
Found AVX2
Found AVX
Found SSE
Commit: 162f3707e28102451338ced235c80ffa6f09ae40 (Merge pull request #3082 from bertsky/fix-line-detector)
Platform: Ubuntu 18.04.4 LTS
Tessdata: https://raw.githubusercontent.com/tesseract-ocr/tessdata/master/eng.traineddata
Following are input image: https://user-images.githubusercontent.com/8704662/92300832-517da180-ef88-11ea-8ce3-149bd3e79bc3.PNG
My command:
namld@sunny:~/prjs/tesseract_build$ ./bin/tesseract ~/vmshare/have-image.PNG have-image-original config.txt
Tesseract Open Source OCR Engine v5.0.0-alpha-777-g162f3 with Leptonica
Segmentation fault (core dumped)
namld@sunny:~/prjs/tesseract_build$ cat config.txt
classify_enable_adaptive_matcher 0
namld@sunny:~/prjs/tesseract_build$
I don't know whether or not tesseract maintain the functionality for config "classify_enable_adaptive_matcher 0". However, I see that recommend for above setting are not on the FAQ of tesseract https://tesseract-ocr.github.io/tessdoc/FAQ for now. Do you think that is enough for close this bug?
Best regards, Le Duc. Nam
Thank you for testing. A segmentation fault is always something which has to be fixed, so this issue should be kept open.
@tfmorris commented:
The issue is that the code has evolved so that classify_enable_adaptive_matcher=0 is no longer supported. There are sections of code which don't check this config variable and which assume that the adaptive matcher is correctly initialized. We can either drop the config variable or fix the code so that the variable protects everything that needs to be protected. I don't know how much work that'll be, but it's more than just this one place, because I fixed it and it just died somewhere else. No idea how many places there are to fix or whether it makes sense from @theraysmith's point of view to continue supporting this case.
This issue is open since March 2016. I suggest to remove the classify_enable_adaptive_matcher
variable from classify.cpp and classify.h and fix two conditions in adaptmatch.cpp.
@stweil, what about this issue?
I had a look on it, but saw no fast solution up to now. So I am afraid it will have to wait until after 5.0.0-rc1.
I had a look on it, but saw no fast solution up to now.
The fast solution is to disable the variable classify_enable_adaptive_matcher
(with #if 0) or remove it to prevent a crash.
In the future, if you'll find a way to prevent the crash, you can undo this removal.
That's right. It now still exists, but has no effect.
I follow comment in this link: FAQ There are inconsistent r....
But when I using tesseract with that options:
I received one message like following:
I think this is one bug, because setting in config file is common for user. I find on all forum but not have any topic talk about this issue.