Closed charlesmindee closed 2 years ago
Hello !
First of all, thank you for your work, this library is very good. I am having issues with the detection of skewed documents (french ID cards). I can share my thoughts on this matter, maybe we can discuss them.
get_bitmap_angle
is not very robust in estimating the general skew angle of the image. By using a fixed n_ct
of 20, you get much noise from small boxes that are not very relevant. These artefacts will artifically increase the std and eventually result in not changing the angle. I suggest we use a smaller n_ct or a n_ct based on a ratio (top 5% bigger boxes) or even use all the contours alltogether.detection_predictor
to the recognition_predictor
, I found that the results are significantly less reliable compared with the results obtained when first rotating the document with the angle and then using the complete pipeline. For my use case, I will currently stay with the method I have for the french ID card which is :
DecriptorMatcher
from OpenCVHi there @Rob192 :wave:
Glad you found the library useful! About both your points, I fully agree that this is still not optimal. We're aiming at stabilizing the rotation feature for our next release. To give you more information on this topic:
Giving some thoughts about this earlier on, I would argue that the box size per se would not be an ideal filter. However the aspect ratio of unsquashed boxes (with shortest side bigger than a threshold) would qualify as a good filter. The thinking behind this is the following: in a document with huge letters with significant separation between each other, we might first end up with a segment for a single letter ("R" of "République française" in your use case). The box of the letter is big, but its aspect ratio being close to 1, it would not provide any useful information for the orientation prediction.
I agree if we were only covering cases where we assume there is a single orientation for all the text elements in the image. But here we aim at supporting multi-orientation. To touch on this, as of now, our latest pretrained models haven't yet be trained to handle rotation. But we'll get fresher model releases soon enough :smiley:
That being said, I agree that first detecting the edges of the document (ID card or anything else), warping it to get something rather straight, would help with performances!
For now, if you have a short-term improvement suggestion for get_bitmap_angle
or about document edge detection, happy to discuss this further :pray:
Thanks again for this first step @Rob192 ! For reference in this issue tracker, here are the different steps we consider taking:
Page-level orientation
Box/line-level orientation
Do you guys still want to do a benchmark comparison before closing this one ?
Hello guys, could you please provide instructions on how to use the new page skew correction util in version 0.5 ? Also, i wasn't able to get the OCR to output rotated bounding boxes even after upgrading to v0.5 Could you please help ?
Hey everyone :wave:
My apologies, we've been quite busy over the past weeks!
@Rob192 yes that might be a good idea actually :+1: Did you have something specific in mind?
@mahmoudaljan actually, you're right: we haven't documented well how to use this! I just opened #807, we'll get back to you shortly!
We should be able to detect the rotation of a page (angle from 0 to 359°) to redress the page before sending it to the OCR. This would highly improve our predictor on "tricky" datasets where pages are often rotated.
Some documents are very complex and have areas of text with different orientations, but even for these documents we can define a main orientation for the page (most of the lines would be oriented this way).
This leads us to define 2 levels of orientation:
Any suggestion is welcome! I think firstly we should implement page orientation which should be relatively easy and then focus on the other part which is far more trickier.
@fg-mindee