henrivain / TesseractOcrMaui

Tesseract wrapper for Windows, Android and iOS for .NET MAUI
Apache License 2.0
37 stars 4 forks source link

Add iterators to have more control over recognition process #41

Closed henrivain closed 4 months ago

henrivain commented 4 months ago

Result iterator

Tesseract result iterator gives more control over image result output.

Output

Result iterator gives access to different recognition block sizes that are

Implementation order

Page iterator

Tesseract page iterator gives access to text location in image. This is secondary milestone.

henrivain commented 4 months ago

Development progress

Follow progress in add-recognizion-iterator-functionality -branch

Branch now runs TesseractOcrMaui/TesseractTestClass.cs -> RunAsync(); at startup for easier testing during development.

henrivain commented 4 months ago

Issue created from user request

Is it possible to get an array of lines rather than the whole text as a string ?
Tesseract returning hierarchy structure for OCR is (like Azure and friends do…):

Page
  Block
    Paragraph
      Line
         Word

The most efficient way would be a new function returning an array of Pages, 
each page would have an array of Blocks, each block an array of paragraphs and so on.

Doing this, we could have a real analyze of the content, to extract some identified values, 
and not only a “big string” where getting a word with its “meaning” is not really possible.
henrivain commented 4 months ago

Uploaded nuget for IOS dll imports https://www.nuget.org/packages/TesseractOcrMaui.IOS/1.1.0