Open robertknight opened 5 months ago
Example of an image where this comes up (source):
Text line images from the slides currently look like this when input to the recognition model (see output of ocrs image.jpeg --text-line-images
):
If the line were rectified first, the accuracy should improve a lot.
Reference implementation using OpenCV's image transform functions.
Usage:
python rectify.py slide.jpeg line.png '1105,316;1630,458;1622,498;1105,356' 517,64
Note the coordinate order is clockwise from top left. This produces line.png
:
From the rectified image, Ocrs is able to correctly extract the text, whereas from the original the output is garbage.
Ocrs does not currently apply any perspective correction to extracted text lines before applying recognition. The recognition model is trained to handle skewed and rotated inputs, but this only works for moderate rotation. Text lines with significant rotation will have their characters squashed in the vertical direction during preprocessing, as recognition inputs have a fixed height of 64px. This harms recognition accuracy.
The library should rectify line images before recognition to better handle rotated/skewed inputs.