robertknight / ocrs

Rust library and CLI tool for OCR (extracting text from images)
Apache License 2.0
1.1k stars 46 forks source link

Fix panic in layout analysis when average word spacing in a line is negative #20

Closed robertknight closed 8 months ago

robertknight commented 8 months ago

Word detection can produce words which have a small overlap. This is because the detection model predicts boxes which are slightly shrunk from their true size, and post-processing then pads the boxes to recover the "true" size.

This means it is possible for the average inter-word spacing in a line to be negative. When this happens a panic occurred when converting the spacing from signed to unsigned.

The fix is just to clamp the computed average spacing to be >= 0.

Fixes https://github.com/robertknight/ocrs/issues/19