zylo117 / Yet-Another-EfficientDet-Pytorch

The pytorch re-implement of the official efficientdet with SOTA performance in real time and pretrained weights.
GNU Lesser General Public License v3.0
5.2k stars 1.27k forks source link

Anchor's settings for text detection? #294

Open ndcuong91 opened 4 years ago

ndcuong91 commented 4 years ago

Hello @zylo117 Thank you for your great work! It's also suprise that you understand the limitation of other Pytorch's implementations :) I'm working on text detection for invoices. My target is to recognize some important fields in it like picture below

iiiiii

The box's height about 8 -> 25 pixels after resize, while box's width about 1x ->16x bigger For first step, i only need to detect 3 fields in green box ('form','serial', 'tax_code'). I tried D2 with default anchor's settings and it was not good. After that, i changed settings to: anchors_scales: '[2 0]' anchors_ratios:** '[(0.25, 0.25), (0.5, 0.25), (0.8, 0.25),(1.2, 0.25),(1.6, 0.25),(2.1, 0.25),(2.7, 0.25),(3.3, 0.25),(4, 0.25)]' The result was still not good like this (final loss is about 0.3x) sample_invoice1_visualized

I read some threads about that but i still don't know how to modify them correctly. Could you give me a hint? and do you think efficientDet is good for this type of text detection?

zylo117 commented 4 years ago

The loss will be faking low if anchors don't match. Maybe try this repo to re-calculate the anchors. https://github.com/Cli98/anchor_computation_tool

zylo117 commented 4 years ago

And it may not be a good idea to keep only one anchor scale.

ndcuong91 commented 4 years ago

And it may not be a good idea to keep only one anchor scale.

thanks. At least i found my mistake for not modify anchor's setting when inference. The result of training must be better like this sample_invoice1_visualized

wenjun90 commented 4 years ago

And it may not be a good idea to keep only one anchor scale.

thanks. At least i found my mistake for not modify anchor's setting when inference. The result of training must be better like this sample_invoice1_visualized

Hi @titikid Could you share me the final values of anchors_scales and anchors_ratios in your case please?

Thank you very much.

ndcuong91 commented 4 years ago

@wenjun90 this one

anchors_scales: '[2 ** 0]' anchors_ratios: '[(0.25, 0.25), (0.5, 0.25), (0.8, 0.25),(1.2, 0.25),(1.6, 0.25),(2.1, 0.25),(2.7, 0.25),(3.3, 0.25),(4, 0.25)]'

However, they are not good enough and i'm finding another one

wenjun90 commented 4 years ago

@titikid Thank you for your answer! Could I ask you about your dataset? How many images for training and valid, batch size and learning rate? Have you try with D1 and D2? Thank you again!

ndcuong91 commented 4 years ago

@titikid Thank you for your answer! Could I ask you about your dataset? How many images for training and valid, batch size and learning rate? Have you try with D1 and D2? Thank you again!

@wenjun90 I tried D0 and D2. My dataset has 340 training imgs and 20 validation imgs. I don't change batchsize and learning_rate from default settings. What's your application?

wenjun90 commented 4 years ago

@titikid I am working in the detection of text block too. I tried faster-rcnn well performance for mAP but not good for time of prediction on cpu. On cpu need 4s for prédiction each image. It not difficult to apply in real-time web app.

ndcuong91 commented 4 years ago

@titikid I am working in the detection of text block too. I tried faster-rcnn well performance for mAP but not good for time of prediction on cpu. On cpu need 4s for prédiction each image. It not difficult to apply in real-time web app.

If you want to detect all text block in real-time, i recommend another text detection models like Differential Binarization. I tried and it's accurate with high speed

wenjun90 commented 4 years ago

@titikid I am working on this project. I just wanna tried this algo to know the performance in prediction.

https://miro.medium.com/max/1200/1*gAx3-sIpo09bPDCZ2fI_kw.png

ndcuong91 commented 4 years ago

@wenjun90 i'm also interested in Document Layout recognition and i want to apply it in the future. What's your models?

wenjun90 commented 4 years ago

@titikid Cascade rcnn. It give the best metrics but FPS is not good versus other model.

SubramanianKrish commented 4 years ago

@titikid Could you tell me what's the size of your image before resizing? A followup to that is do you use the author's preprocess function to resize the image?