open-mmlab / mmocr

OpenMMLab Text Detection, Recognition and Understanding Toolbox
https://mmocr.readthedocs.io/en/dev-1.x/
Apache License 2.0
4.27k stars 743 forks source link

Poor detection of words with pretrained models #636

Closed rmdmohan20 closed 2 years ago

rmdmohan20 commented 2 years ago

Hi, I have been trying to work on detecting words in images of invoices. Word detection is working for images with smaller dimensions. But it is not detecting higher dimension images. I read in 354 and 274 that if the images is cropped into smaller pieces it is working fine. But i want to detect word on an whole image. If training needs to be done with more images, what would be the ideal number to chose and in what basis should the training be, to get a good results with detection.

cuhk-hbsun commented 2 years ago
  1. What is the image size you use in the config file? Is it something like the input image size?
  2. I suggest that try to finetune the model on your own data. The more images treated as trainset, the better result you will get.
rmdmohan20 commented 2 years ago

Thanks for the reply.

  1. We used default image size in the config.
  2. I observed the output for images in different ranges of dimensions and i found out that images with resolution around 500pixels, detections are good. as the resolution increases detection goes wrong. Attaching the images below, might be useful 907x791 pixels 526_70%_fcenet_pretrain 389x339 pixels 526_30%_fcenet_pretrain Here we can see the detections are better for second image. To improve detections for all dimension, will it be a good start with 10000 images?
cuhk-hbsun commented 2 years ago

Yes, you can fine-tune the mode with your own data (like 2000 images) first to see the results.

rmdmohan20 commented 2 years ago

Thank you so much @cuhk-hbsun !!