jacklicn / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

tesseract-OCR clogs up prompt by printing "Tesseract Open Source OCR Engine v3.01 with Leptonica" over and over #580

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I would like a different committer to read/deal with this, the one dealing with 
#579 is not reading my posts and marks the issue as invalid when it is not.

Tesseract-OCR prints the line "Tesseract Open Source OCR Engine v3.01 with 
Leptonica" every time I start the program, clogging up my command prompt (as I 
need to continuously read what is on my screen).

It does not matter whether I use the tesseract-OCR program itself or a wrapper, 
the program prints "Tesseract Open Source OCR Engine v3.01 with Leptonica" 
every time I use tesseract.

I am not intersted in having anything at all printed, Python will deal with the 
output - However, Tesseract-OCR (Not PyTesser) prints "Tesseract Open Source 
OCR Engine v3.01 with Leptonica". 

Original issue reported on code.google.com by Zedlakit...@hotmail.com on 20 Nov 2011 at 6:25

GoogleCodeExporter commented 9 years ago
This is Tesseract-OCR 3.01, by the way - The issue did not exist in 
Tesseract-OCR 3.0

Original comment by Zedlakit...@hotmail.com on 20 Nov 2011 at 6:26

GoogleCodeExporter commented 9 years ago
The wrapper seems to be called python-tesseract and not PyTesser. This does 
however not change the fact that it is Tesseract.exe (Tesseract-OCR) who prints 
the line.

Line 72 is by the way what starts tesseract.exe (Tesseract-OCR): 
https://github.com/jbochi/python-tesseract/blob/master/tesseract.py

And as I have said several times, the line is printed regardless of if I use 
tesseract-ocr or a wrapper.

Original comment by Zedlakit...@hotmail.com on 20 Nov 2011 at 6:42

GoogleCodeExporter commented 9 years ago
I leave it open for other commiter as you wish ;-) anyway (to save time of 
others):
1. Do not report your problem with wrapper as tesseract-ocr problem ;-)
2. Please provide REAL information not only your assumption. I did it for you:
A. I installed tesseract 3.00 
(http://tesseract-ocr.googlecode.com/files/tesseract-ocr-setup-3.00.exe) and 
run test:
C:\Program Files\Tesseract-OCR3>tesseract.exe doc\eurotext.tif eurotext
Tesseract Open Source OCR Engine with Leptonica

B. I installed 3.00.1 
(http://tesseract-ocr.googlecode.com/files/tesseract-3.00.1.exe.zip) and test 
it:
C:\Program Files\Tesseract-OCR3>tesseract.exe doc\eurotext.tif eurotext
Tesseract Open Source OCR Engine with Leptonica
Number of found pages: 1.

C. I installed 2.04 version and test it:
c:\Program Files\Tesseract-OCR2>tesseract.exe eurotext.tif eurotext
Tesseract Open Source OCR Engine
Image has 1 * 1 bit  per pixel, and size (1024,800)
Resolution=300

D. I tested also 3.01 with this results:
C:\Program Files\Tesseract-OCR>tesseract.exe doc\eurotext.tif eurotext
Tesseract Open Source OCR Engine v3.01 with Leptonica
Page 0

As you can see all version make output to console... This is no bug - it is 
feature. 

What does it mean? That YOUR wrapper was able to handle message for 3.00 
version but WRAPPER IS NOT ABLE TO MAKE THE SAME in 3.01 as in 3.00. So where 
is problem?

Have a nice day. 

Original comment by zde...@gmail.com on 20 Nov 2011 at 8:15

GoogleCodeExporter commented 9 years ago
That makes more sense, actually. Though the possibility for an argument to 
tesseract.exe to not output the Tesseract Open Source OCR ...-line would be 
nice.

Original comment by Zedlakit...@hotmail.com on 20 Nov 2011 at 9:14

GoogleCodeExporter commented 9 years ago
problem is that this "feature" is presented in tesseract in more version (so 
removing it could produce problem to other users). I personally use it for 
detecting 3.00 or 2.04 version ;-) ("-v"  was implemented on windows only in 
3.01).

IMHO your wrapper should also need this (identify version) - to know which 
feature could be used (hocr, OSD)

Original comment by zde...@gmail.com on 21 Nov 2011 at 8:54

GoogleCodeExporter commented 9 years ago
I solved the issue by adding stdout=subprocess.PIPE as an argument to 
subprocess.Popen() in the wrapper, so it ended up like proc = 
subprocess.Popen(command, stderr=subprocess.PIPE, stdout=subprocess.PIPE) It 
does not print any output now and just returns the OCR result as a string. 

Original comment by Zedlakit...@hotmail.com on 21 Nov 2011 at 9:54

GoogleCodeExporter commented 9 years ago
You could probably delete this version of the issue and leave #579, I apologize 
for getting a bit angry earlier. 

Original comment by Zedlakit...@hotmail.com on 21 Nov 2011 at 9:55

GoogleCodeExporter commented 9 years ago

Original comment by zde...@gmail.com on 22 Nov 2011 at 7:57

GoogleCodeExporter commented 9 years ago
Quite a dumb feature if you ask me... 

I am on linux, and if I specify "2>&1 1> /dev/null" I definitely do not want to 
see a stupid string like "Tesseract Open Source OCR Engine".  

Original comment by guido.va...@gmail.com on 18 Feb 2012 at 4:49