yjh0410 / CenterNet-plus

A Simple Baseline for Object Detection
55 stars 11 forks source link

Detection Head explanation #7

Closed YashRunwal closed 3 years ago

YashRunwal commented 3 years ago

Hi,

So, I want to train this model using Grayscale images of size (512, 1536). So I have built my own ResNet18 backbone, had to add a few more layers. As I was reading the ReadMe, I came across something called as Detection Head . It is mentioned that the During training stage, how to get the labels of offset and size is different from CenterNet

  1. I would like to understand how it is different from CenterNet
  2. CenterNet follows an Anchorless approach and generates Cornet point heatmaps and center keypoint heatmaps using Cascade Cornet Pooling. However, I don't see the Cascade Corner Pooling module in this repo. Is there a reason as to why this hasn't been used?
  3. One last question: What is offset(tx and ty) branch, size(tw and th) branches? I think I can deduce the size branch which would be the size of the box (please correct if I'm wrong). But I was wondering as to what this offset branch means.

Thank You.

yjh0410 commented 3 years ago

Hi,

  1. The CenterNet which I follow is proposed in paper Objects as Points. It doesn't use Cascade Corner Pooling module.
  2. The offset(tx and ty) corresponds the center of a bbox. As the size of our heatmap is H/4 x W/4, we can get the center point (x, y) of a object instance whose score is the highest in the heatmap. Then we use the offset to refine them to get the precise coordinations on the original image. The more details about offset, you can refer the CenterNet paper Objects as Points. On the other hand, the size corresponds the width and height of a bbox. In official CenterNet, it just regresses w and h. Different from it, I first use log function to compress w and h to get the tw and th which be set as the target of the size, then I use exp function to map tw and th to w and h. I found my log-exp method is better.
YashRunwal commented 3 years ago

Hi,

Thank you for replying so quickly. I can't believe I was reading a different CenterNet paper. Unbelievable. So I will read Objects as Points as get back to you if need be.

I will try to train the NN and let you know the results of Grayscale images. Do you think training from scratch is beneficial or using a pre trained model?