xingyizhou / CenterNet

Object detection, 3D detection, and pose estimation using center point detection:
MIT License
7.27k stars 1.93k forks source link

Dimension of heads #849

Open ash2703 opened 3 years ago

ash2703 commented 3 years ago

What should the shape of all 3 heads be?

hm output:  ([1, 1, 128, 128])  #Single class
wh output:  ([1, 2, 128, 128])
reg output: ([1, 2, 128, 128])

But when the dataloader is run self.opt.dense_wh is false hence wh size is ([1, 128, 2])

Can anyone explain what the actuak dimensions of head should be while training @xingyizhou @chengzhengxin

xingyizhou commented 3 years ago

Hi, Thank you for your interests. The 128 in 1x128x2 means the max number of objects in the image. Usually it is much smaller that 128, all other values are masked out by a mask. This is different from the head dimension, we use inds to extract the locations in the dense outputs.

ash2703 commented 3 years ago

Thankyou for the answer, But what is the dimension of the 3 heads be while training? As i mentioned self.opt.dense_wh = False , Is the network output then reduced to [num_classes * num_obj * 2]

ash2703 commented 3 years ago
input         (3, 512, 512)  
hm            (1, 128, 128)   
reg_mask      (128,)       
ind           (128,)  
wh            (128, 2) 
reg           (128, 2) 

These are the dimensions returned by my dataloader where 128 is the max_objs, how should my head be structured?