Open josef821 opened 1 month ago
Currently the recognition model and layout logic assumes that the image is approximately upright (some amount of rotation or skew is OK) and that the text is read left to right. To work with rotated or severely skewed images, they need to be rotated / de-skewed as a preprocessing step. Eventually this should be integrated into this library, but in the meantime you could try something like:
OcrEngine::detect_words
method to detect bounding boxes of connected areas (the white regions in the top-left image)imageproc
crate to rotate the image based on the inferred orientationA more sophisticated approach would be to use an image classification model to infer the orientation of each word, or a sample of words. If a suitable model was created in eg. PyTorch and exported to ONNX, it could then be converted to RTen and used in the above preprocessing pipeline instead of heuristics.
thanks for reply. i will do that. i check all masks to check lines are rotated or not. your layout analyze is not good enough. i check you reply to other. you want to create a model for cluster and sort word to get line bounding box. How long do you think it will take to be able to publish the layout analyze model with its training code?
How long do you think it will take to be able to publish the layout analyze model with its training code?
I don't know. All the code that exists is in the ocrs-models repository, but for layout analysis that only includes some non-functional prototypes.
In the meantime, if you happen to be working with documents that have a predictable layout, you can always substitute the find_text_lines
step with custom code.
exist layout analysis not working good for curve layout or complex image. i waiting for your layout analysis. thanks
hi, thanks for your useful ocr engine, its works good but when i try to set rotated image it return bad result.
is there any fix tips?