Open GoogleCodeExporter opened 9 years ago
What a coincidence!
We ran into the same issue while trying to convert our project from the
previous Tesseract version to this new release. Unfortunately, we can't access
the 'Confidence' variable on the Word class; it always returns 0.0. Previously,
we used it to verify each word after doing text recognition with Tesseract.
Do we have to set 'Confidence' manually by using the method
'UpdateConfidenceAndInsertTo()'? This one doesn't seem to impact the Confidence
property, because the CharList variable has no items... So the problem seems to
be related to the unsetted CharList.
So in short: why is the CharList property not set for a Word object? Because of
this, the Confidence of a Word can't be used.
Original comment by arn...@gmail.com
on 12 Jul 2011 at 1:07
And the other variables (line, format, font, size, value) except for the
co-ords are all 0. r552 works fine - although it didn't have the
doc/block/para/line structure. Sample code:
Dim doc As DocumentLayout = _ocrProcessor.AnalyseLayout(bmp)
For Each blk As Block In doc.Blocks
For Each para As Paragraph In blk.Paragraphs
For Each ln As TextLine In para.Lines
For Each wrd As Word In ln.Words
For Each ch As Character In wrd.CharList
If ch.Value > 31 Then
sbText.Append(Chr(ch.Value))
End If
Next
sbText.Append(" ")
Next
sbText.Append(vbCrLf)
Next
Next
Next
Original comment by hbeanl...@gmail.com
on 13 Jul 2011 at 2:42
AnalyseLayout detect blobs location only, I don't collect any information.
In my opinion, AnalyseLayout will be pre-processing step to identify
the location of blobs in which we can add more image processing
function to improve and pass them to tesseract parallel recognition
with other flag.
TesseractProcessor::Recognize enables to user to recognize text with
some parameters:
- PageSegmentMode: single block/para/textline/word.... for your purpose...
- OcrEngineMode: recognize with specific engine
- ROI, UseROI: recognize only inside a ROI
Goodluck!
Original comment by congnguy...@gmail.com
on 13 Jul 2011 at 6:29
Thanks you for your rapid response. I tried to use your settings instruction
but I havent got any result with it. I am asking myself if it is possible to
get the position of a char in combination with the char itself. If it is
possible than maybe u can explain me how i can do this beside parsing the text
i get from Recognize.
If this function is not implemented in the current version(590) then I am
curious if this function is begin implemented in the next version?
Thank you in advance.
Original comment by tim.verv...@gmail.com
on 13 Jul 2011 at 8:18
Try to search around in origin tesseract-ocr engine for "chopper"...
If you find any relative variables, you can use
"TesseractProcessor::SetVariable(..)" function...
Sorry, I have no plan to next version now. I will announce about it ASAP!
Original comment by congnguy...@gmail.com
on 13 Jul 2011 at 11:01
I look forward to seeing this re-implemented, seeing as it was in r552, so it
should not be hard to re-add.
This info is useful to reconstruct the document's layout (in rtf/html format).
Tesseract gives us the co-ords, font class, font size which can be useful.
If a form is rubber stamped, i do not know exactly where the stamp is, so i
should be able to look for some text of a certain format in a certain size and
confidently identify it.
Original comment by hbeanl...@gmail.com
on 26 Jul 2011 at 12:31
Yeah please add this back in - desperately need confidence levels for license
plate recognition
Original comment by ad...@developerinabox.com
on 2 Sep 2011 at 5:11
Using tesseractdotnetwrapper_r590.zip
in every Sample*.cs and every input image I get only 1 block at
doc = processor.AnalyseLayout(bmp);
Is it tesseract.dll foult or missed configuration of processor?
VS2010 Debug x86
Win7 32-bit
Original comment by povver...@gmail.com
on 14 Oct 2011 at 1:39
Original issue reported on code.google.com by
tim.verv...@gmail.com
on 12 Jul 2011 at 12:14