mindee / doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
https://mindee.github.io/doctr/
Apache License 2.0
3.95k stars 448 forks source link

Margin in Crops #1582

Closed ArsalanYounus007 closed 7 months ago

ArsalanYounus007 commented 7 months ago

Bug description

Hello,

I hope you are having a good day. I am getting some problem with db_resnet_50 (TF). The output Bbox contains a lot of margin and overlaps with the boxes around it. image

I was able to fix the problem by increasing the bin_thresh to 0.5 - 0.6 and applying further post processing turning it into this

image

That solves the Bbox overlapping issue for me but started another problem, Now words are missing. missing_words

This I was able to fix by splitting image (original images are big 2550 width x 3300 height) into 2 by finding the most empty location in the middle of image. OCRing that make the detection better. But still miss some words occassionally.

What would you recommend?? Aside from training my own detection model πŸ˜† (I will do it later)

Code snippet to reproduce the bug

from doctr.models import ocr_predictor
model = ocr_predictor(det_arch='db_resnet50', reco_arch='crnn_vgg16_bn',assume_straight_pages=True,pretrained=True)
model.det_predictor.model.postprocessor.bin_thresh = 0.4
model.det_predictor.model.postprocessor.box_thresh = 0.4
json_output = model([single_img_doc])

Error traceback

NA

Environment

Conda ENV Python3.10 Windows 11

Deep Learning backend

image

felixdittrich92 commented 7 months ago

Hey @ArsalanYounus007 πŸ‘‹ A short suggestion: If possible take the main branch and try fast_base or db_resnet50 (pytorch) :)

ArsalanYounus007 commented 7 months ago

I have already tried pytorch db_resnet50, It's better but still misses words. I will try fast_base with pytorch and see the results

felixdittrich92 commented 7 months ago

I have already tried pytorch db_resnet50, It's better but still misses words. I will try fast_base with pytorch and see the results

Keep in mind fast_base is only available at main branch atm :)

ArsalanYounus007 commented 7 months ago

Yep, It's better at detecting words However, I am back at the overlapping Bboxes from left (if it's not the first word) and right.