Open ash2703 opened 3 years ago
Hi,
Thank you for your interests. The 128 in 1x128x2
means the max number of objects in the image. Usually it is much smaller that 128, all other values are masked out by a mask. This is different from the head dimension, we use inds
to extract the locations in the dense outputs.
Thankyou for the answer,
But what is the dimension of the 3 heads be while training?
As i mentioned self.opt.dense_wh = False
, Is the network output then reduced to [num_classes * num_obj * 2]
input (3, 512, 512)
hm (1, 128, 128)
reg_mask (128,)
ind (128,)
wh (128, 2)
reg (128, 2)
These are the dimensions returned by my dataloader where 128 is the max_objs, how should my head be structured?
What should the shape of all 3 heads be?
But when the dataloader is run self.opt.dense_wh is false hence wh size is ([1, 128, 2])
Can anyone explain what the actuak dimensions of head should be while training @xingyizhou @chengzhengxin