robertknight / ocrs

Rust library and CLI tool for OCR (extracting text from images)
Apache License 2.0
1.03k stars 41 forks source link

Rectify text lines before recognition #33

Open robertknight opened 5 months ago

robertknight commented 5 months ago

Ocrs does not currently apply any perspective correction to extracted text lines before applying recognition. The recognition model is trained to handle skewed and rotated inputs, but this only works for moderate rotation. Text lines with significant rotation will have their characters squashed in the vertical direction during preprocessing, as recognition inputs have a fixed height of 64px. This harms recognition accuracy.

The library should rectify line images before recognition to better handle rotated/skewed inputs.

robertknight commented 4 months ago

Example of an image where this comes up (source):

slide

Text line images from the slides currently look like this when input to the recognition model (see output of ocrs image.jpeg --text-line-images):

If the line were rectified first, the accuracy should improve a lot.

robertknight commented 4 months ago

Reference implementation using OpenCV's image transform functions.

Usage:

python rectify.py slide.jpeg line.png '1105,316;1630,458;1622,498;1105,356' 517,64

Note the coordinate order is clockwise from top left. This produces line.png:

line

From the rectified image, Ocrs is able to correctly extract the text, whereas from the original the output is garbage.