Detection model performs poorly when image is scaled (e.g. 1.5x in both dims)

mindee / doctr

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

https://mindee.github.io/doctr/

Apache License 2.0

3.54k stars 415 forks source link

Detection model performs poorly when image is scaled (e.g. 1.5x in both dims) #1535

Closed ajkdrag closed 3 months ago

ajkdrag commented 5 months ago

Bug description

Detection model performs poorly when image is scaled (e.g. 1.5x in both dims)

Code snippet to reproduce the bug

img = cv2.resize(img, None, fx=1.5, fy=1.5, interpolation=cv2.INTER_CUBIC)

If I do something like to my dataset, the detection model performs poorly. I am using:

"preserve_aspect_ratio": True,
"symmetric_pad": True

Error traceback

No error, but poor bboxes.

Environment

DocTR version: 0.8.1 TensorFlow version: N/A PyTorch version: N/A (torchvision N/A) OpenCV version: N/A OS: Debian GNU/Linux 11 (bullseye) Python version: 3.8.18 Is CUDA available (TensorFlow): N/A Is CUDA available (PyTorch): N/A CUDA runtime version: 11.8.89 GPU models and configuration: GPU 0: Tesla T4 Nvidia driver version: 525.105.17 cuDNN version: Could not collect

Deep Learning backend

is_tf_available: False is_torch_available: True

felixdittrich92 commented 5 months ago

Hey @ajkdrag 👋,

Thanks for the feedback. :) In general upscaling is never a good idea because you lose a lot of quality. If you really need to do this you could try it with super resolution (for example: https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/latent_upscale).

But yeah with the last runs we have already extended the applied augmentations but there is still some space for additions like zoom (in/out) / quality compression / etc.

Only for interest have you also tried the newly trained fast models from main branch ? :)

ajkdrag commented 5 months ago

I have yet to try the Fast models. I saw the links were updated. Will give it a shot today. The problem i am facing (when testing with DB models) is that for a few set of documents whose dims is say 1176 x 762, if I do upscale and then run, the detection output is still fine and the recognition output improves significantly while for other set of images the detection output degrades and fewer boxes are captured.

ajkdrag commented 5 months ago

Also, in DBNet, the preprocessing gives an image of size 1024x1024, is it possible that for large rectangular docs like bank checks, resizing to square will mess up things?

felixdittrich92 commented 5 months ago

@ajkdrag with keep_aspect_ratio=True (default) and symmetric_pad=True (default) it shouldn't. But feel free to play a bit with both parameters. You can also try to lower the bin_thresh for DB models:

https://mindee.github.io/doctr/using_doctr/using_models.html#advanced-options

ajkdrag commented 5 months ago

I tried the Fast model and it works pretty good, but i expected the Fast model to be "fast" :D it takes like a sec per image, but papers with code mentioned it to be almost realtime. I am using the main branch with reparameterize.

felixdittrich92 commented 5 months ago

Hey yeah 😃 That's not really comparable because we work on larger Images (1024x1024) and the model needs to detect/segment much more text instances, additional we have had to modify the postprocessing that it works also well with text rich documents (maybe we could track how much time goes on the postproc but i don't think that it will be too much)

All papers (DB / FAST) was build for scene text detection in the wild and tested on datasets like IC15, etc.

ajkdrag commented 5 months ago

Got it. Fast works well, but I think I understand the issue now. For images that are "long", i.e. aspect ratio say: 1176 x 256 , the bin-thresh is really tricky to work with. In my use case, (which is scanned bank checks), I get images that are of this aspect ratio, and for few batches, if I set bin_thresh to 0.2 it works well, while for others, I have to go down to 0.08. Could you suggest some tricks/workarounds for such usecases?

felixdittrich92 commented 4 months ago

Hey sorry i totally missed your message. Have you found a way to handle it ?

felixdittrich92 commented 3 months ago

Moved to #1604