Closed Sicos1977 closed 2 years ago
Please provided minimal code for reproduction including a testing/input image.
I'm using this Image
And the code is in C# because I'm writing a new C# wrapper (that is somewhat a copy of this one https://github.com/charlesw/tesseract
const string resultPath = @"EngineTests\CanProcessPixUsingResultIterator.txt";
var result = new StringBuilder();
string actualResult;
using (var engine = CreateEngine())
{
using (var img = LoadTestPix(TestImageFileColumn))
{
using (var page = engine.Process(img))
{
foreach (var block in page.Layout)
{
result.AppendLine($"Block text: {block.Text}");
result.AppendLine($"Block confidence: {block.Confidence}");
foreach (var paragraph in block.Paragraphs)
{
result.AppendLine($"Paragraph text: {paragraph.Text}");
result.AppendLine($"Paragraph confidence: {paragraph.Confidence}");
foreach (var textLine in paragraph.TextLines)
{
result.AppendLine($"Text line text: {textLine.Text}");
result.AppendLine($"Text line confidence: {textLine.Confidence}");
foreach (var word in textLine.Words)
{
result.AppendLine($"Word text: {word.Text}");
result.AppendLine($"Word confidence: {word.Confidence}");
result.AppendLine($"Word is from dictionary: {word.IsFromDictionary}");
result.AppendLine($"Word is numeric: {word.IsNumeric}");
result.AppendLine($"Word language: {word.Language}");
//foreach (var symbol in word.Symbols)
//{
// result.AppendLine($"Symbol text: {symbol.Text}");
// result.AppendLine($"Symbol confidence: {symbol.Confidence}");
// result.AppendLine($"Symbol is superscript: {symbol.IsSuperscript}");
// result.AppendLine($"Symbol is dropcap: {symbol.IsDropcap}");
//}
}
}
}
}
// TODO : Do some checking here
actualResult = result.ToString();
File.WriteAllText("d:\\result.txt", actualResult);
}
}
}
You can find how I implemend the enumerator in C# overhere --> https://github.com/Sicos1977/TesseractOCR/tree/master/TesseractOCR/Layout
I start at creating the Blocks class and then iterator over it.
When I'm done one level, at the Paragraph level or lower I just can do this
https://github.com/Sicos1977/TesseractOCR/blob/master/TesseractOCR/Layout/Paragraphs.cs
IsAtFinalElement expect 2 "elements" for example Paragraph and textLine but there is nothing above block so I have no idea what to give as a value at the first parameter. I tried -1 but that just gives an error.
Here's a hint: https://github.com/tesseract-ocr/tesseract/blob/d8d63fd71b8d56f73469f7db41864098f087599c/src/api/hocrrenderer.cpp#L190-L195
Please use our forum for asking questions.
Thanks, I would not have figured that one out :-)
Environment
Current Behavior:
int IsAtFinalElement(Block, Block); always returns 1 (true)
Expected Behavior:
To return 0 (false) when I'm not at the final element
Suggested Fix:
To return 0 (false) when I'm not at the final element