robertknight / ocrs

Rust library and CLI tool for OCR (extracting text from images)
Apache License 2.0
1.25k stars 58 forks source link

Ocr of rotated image #121

Open josef821 opened 1 month ago

josef821 commented 1 month ago

hi, thanks for your useful ocr engine, its works good but when i try to set rotated image it return bad result. Screenshot 2024-09-30 161933

is there any fix tips?

robertknight commented 1 month ago

Currently the recognition model and layout logic assumes that the image is approximately upright (some amount of rotation or skew is OK) and that the text is read left to right. To work with rotated or severely skewed images, they need to be rotated / de-skewed as a preprocessing step. Eventually this should be integrated into this library, but in the meantime you could try something like:

  1. Call the OcrEngine::detect_words method to detect bounding boxes of connected areas (the white regions in the top-left image)
  2. Infer the orientation from the positions and aspect ratios of the boxes (eg. if most boxes are tall rather than wide, that means the text is probably upside-down)
  3. Use functions in the imageproc crate to rotate the image based on the inferred orientation
  4. Perform OCR or the rotated image

A more sophisticated approach would be to use an image classification model to infer the orientation of each word, or a sample of words. If a suitable model was created in eg. PyTorch and exported to ONNX, it could then be converted to RTen and used in the above preprocessing pipeline instead of heuristics.

josef821 commented 1 month ago

thanks for reply. i will do that. i check all masks to check lines are rotated or not. your layout analyze is not good enough. i check you reply to other. you want to create a model for cluster and sort word to get line bounding box. How long do you think it will take to be able to publish the layout analyze model with its training code?

robertknight commented 1 month ago

How long do you think it will take to be able to publish the layout analyze model with its training code?

I don't know. All the code that exists is in the ocrs-models repository, but for layout analysis that only includes some non-functional prototypes.

In the meantime, if you happen to be working with documents that have a predictable layout, you can always substitute the find_text_lines step with custom code.

josef821 commented 1 month ago

exist layout analysis not working good for curve layout or complex image. i waiting for your layout analysis. thanks