henrivain / TesseractOcrMaui

Tesseract wrapper for Windows, Android and iOS for .NET MAUI
Apache License 2.0
37 stars 4 forks source link

Add recognizion iterator functionality #42

Closed henrivain closed 4 months ago

henrivain commented 4 months ago

Implement Tesseract Iterators

From issue #41 implement iterators to achieve better structure analysis from recognized text.

Added IEnumerators include

ResultIterator

Iterator used to iterate over text blocks in image. Iterates in one given level at the time for example TextLine or Word.

PageIterator

Iterator used to iterate over text layout on image. Iterates over text Bounding boxes and paragraph layout at one given level. Enables to draw boxes over text on recognized image.

Added IEnumerables include

ResultIterable

IEnumerable impelmentation of ResultIterator.

PageIterable

IEnumerable impelmentation of PageIterator.

TextMetadataIterable

Links PageIterator and ResultIterator to achieve synchronized iteration over text layout and text value.

TextStructureIterable

Links PageIterator and ResultIterator to achieve more thorough text structure analysis. Returns text structure in Tree-like datastructure.

Image example

With configuration of highest level: TextLine and lowestLevel: Symbol, the image below produces tree structure down below.

SmallImage2

TextLine /
├─ Word /
│  ├─ Symbol /
│  │  ├─ T
│  │  ├─ h
│  │  ├─ i
│  │  ├─ s
│  ├─ Symbol /
│  │  ├─ i
│  │  ├─ s
│  ├─ Symbol /
│  │  ├─ A
│  ├─ Symbol /
│  │  ├─ e
│  │  ├─ x
│  │  ├─ a
│  │  ├─ m
│  │  ├─ p
│  │  ├─ l
│  │  ├─ e
│  ├─ Symbol /
│  │  ├─ i
│  │  ├─ m
│  │  ├─ a
│  │  ├─ g
│  │  ├─ e
│  │  ├─ .
TextLine /
├─ Word /
│  ├─ Symbol /
│  │  ├─ T
│  │  ├─ h
│  │  ├─ i
│  │  ├─ s
│  ├─ Symbol /
│  │  ├─ i
│  │  ├─ s
│  ├─ Symbol /
│  │  ├─ a
│  │  ├─ n
│  │  ├─ o
│  │  ├─ t
│  │  ├─ h
│  │  ├─ e
│  │  ├─ r
│  ├─ Symbol /
│  │  ├─ P
│  │  ├─ a
│  │  ├─ r
│  │  ├─ a
│  │  ├─ g
│  │  ├─ r
│  │  ├─ a
│  │  ├─ p
│  │  ├─ h
│  │  ├─ .