Words Detection Merged Bboxes

mindee / doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

https://mindee.github.io/doctr/

Apache License 2.0

3.31k stars 397 forks source link

Words Detection Merged Bboxes #1622

Closed ArsalanYounus007 closed 1 month ago

ArsalanYounus007 commented 1 month ago

Bug description

Hello, I hope you are having a good day. I am having a problem where multiple words are being predicted as 1 and since recongtion model is not trained on spaces, We get a lot of bad results. (please see attached screenshot)

I have tried db_resnet (both torch and TF), fast (both torch and TF).

Any solution is greatly appreciated.

Code snippet to reproduce the bug

def _init_if_needed():
    global crop_classifier, det_model, rec_model
    if not crop_classifier:
        crop_classifier = keras.models.load_model("crop_classifier/")
    if not det_model:
        det_model = detection_predictor(arch='db_resnet50', assume_straight_pages=True, pretrained=True)
        det_model.model.postprocessor.bin_thresh = 0.3
        det_model.model.postprocessor.box_thresh = 0.3
    if not rec_model:
        rec_model = recognition_predictor(arch='crnn_vgg16_bn', pretrained=True)

Error traceback

No error Message - NA

Environment

Conda ENV built with latest doctr commit.

Deep Learning backend

felixdittrich92 commented 1 month ago

Hi @ArsalanYounus007 👋 Could you try to make the light gray text darker (https://answers.opencv.org/question/237967/how-to-darken-faintdim-gray-text/) ?

ArsalanYounus007 commented 1 month ago

I apologize for the confusion, The image was created by me to see the results over the original image using pil's blend function.

This is the crop of original image.

felixdittrich92 commented 1 month ago

Hi @ArsalanYounus007 :wave: I tested both models pytorch (fast_base & db_resnet50) - main branch. But looks like you don't use the ocr_predictor. Could you please check the steps beetween again :) Screenshot from 2024-05-31 13-21-37

ArsalanYounus007 commented 1 month ago

You mean to say det_predictor + rec_predictor != ocr_predictor? Let me check

felixdittrich92 commented 1 month ago

@ArsalanYounus007 correct some steps in the middle are missing like the padding removal from the detection results .. i agree we should adjust this

ArsalanYounus007 commented 1 month ago

I have used ocr_predictor and it gives better words. Do we have option to call detection post processing steps manually? So I can keep det_predictor + rec_predictor approach?

felixdittrich92 commented 1 month ago

Mh what you need to do is:

loc_preds = [list(det_out.values())[0] for det_out in det_preds]
# Rectify crops if aspect ratio
loc_preds = self._remove_padding(pages, loc_preds)  # type: ignore[arg-type]

remove_padding comes from: https://github.com/mindee/doctr/blob/9c92f5cbd0bf64706672c29bf2d43117815c1794/doctr/models/predictor/base.py#L107

I will try to find a time slot next week to move it directly to the detection predictor

ArsalanYounus007 commented 1 month ago

Thank you @felixdittrich92 for all the help. :)