mindee / doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
https://mindee.github.io/doctr/
Apache License 2.0
3.79k stars 434 forks source link

[models] detect page orientation #225

Closed charlesmindee closed 2 years ago

charlesmindee commented 3 years ago

We should be able to detect the rotation of a page (angle from 0 to 359°) to redress the page before sending it to the OCR. This would highly improve our predictor on "tricky" datasets where pages are often rotated.

Some documents are very complex and have areas of text with different orientations, but even for these documents we can define a main orientation for the page (most of the lines would be oriented this way).

This leads us to define 2 levels of orientation:

Any suggestion is welcome! I think firstly we should implement page orientation which should be relatively easy and then focus on the other part which is far more trickier.

@fg-mindee

Rob192 commented 3 years ago

Hello !

First of all, thank you for your work, this library is very good. I am having issues with the detection of skewed documents (french ID cards). I can share my thoughts on this matter, maybe we can discuss them.

  1. I found that the method get_bitmap_angle is not very robust in estimating the general skew angle of the image. By using a fixed n_ct of 20, you get much noise from small boxes that are not very relevant. These artefacts will artifically increase the std and eventually result in not changing the angle. I suggest we use a smaller n_ct or a n_ct based on a ratio (top 5% bigger boxes) or even use all the contours alltogether.
  2. When passing the boxes and the associated angle resulting from the detection_predictor to the recognition_predictor, I found that the results are significantly less reliable compared with the results obtained when first rotating the document with the angle and then using the complete pipeline.

For my use case, I will currently stay with the method I have for the french ID card which is :

  1. Detect the side of the ID card
  2. Rotate the card based on a canvas with a DecriptorMatcher from OpenCV
  3. Apply DocTr on rotated document.
  4. Enjoy
fg-mindee commented 3 years ago

Hi there @Rob192 :wave:

Glad you found the library useful! About both your points, I fully agree that this is still not optimal. We're aiming at stabilizing the rotation feature for our next release. To give you more information on this topic:

  1. Giving some thoughts about this earlier on, I would argue that the box size per se would not be an ideal filter. However the aspect ratio of unsquashed boxes (with shortest side bigger than a threshold) would qualify as a good filter. The thinking behind this is the following: in a document with huge letters with significant separation between each other, we might first end up with a segment for a single letter ("R" of "République française" in your use case). The box of the letter is big, but its aspect ratio being close to 1, it would not provide any useful information for the orientation prediction.

  2. I agree if we were only covering cases where we assume there is a single orientation for all the text elements in the image. But here we aim at supporting multi-orientation. To touch on this, as of now, our latest pretrained models haven't yet be trained to handle rotation. But we'll get fresher model releases soon enough :smiley:

That being said, I agree that first detecting the edges of the document (ID card or anything else), warping it to get something rather straight, would help with performances! For now, if you have a short-term improvement suggestion for get_bitmap_angle or about document edge detection, happy to discuss this further :pray:

fg-mindee commented 3 years ago

Thanks again for this first step @Rob192 ! For reference in this issue tracker, here are the different steps we consider taking:

Page-level orientation

Box/line-level orientation

Rob192 commented 2 years ago

Do you guys still want to do a benchmark comparison before closing this one ?

mahmoudaljan commented 2 years ago

Hello guys, could you please provide instructions on how to use the new page skew correction util in version 0.5 ? Also, i wasn't able to get the OCR to output rotated bounding boxes even after upgrading to v0.5 Could you please help ?

fg-mindee commented 2 years ago

Hey everyone :wave:

My apologies, we've been quite busy over the past weeks!

@Rob192 yes that might be a good idea actually :+1: Did you have something specific in mind?

@mahmoudaljan actually, you're right: we haven't documented well how to use this! I just opened #807, we'll get back to you shortly!