yjh0410 / CenterNet-plus

A Simple Baseline for Object Detection
55 stars 11 forks source link

Understanding training losses and CenterNet model script #10

Closed YashRunwal closed 3 years ago

YashRunwal commented 3 years ago

Hi,

Using grayscale images of size (512, 1536), if I try to evaluate the model using the train.py script wherein the argument trainable=False for the model then we get the following shapes for the variables:

topk_bbox_pred: (100,4)
topk_scores: (100,)
topk_cls_inds (100,)

It is clear that the number 100 indicates the top 100 predictions which are defined in the class argument topk=100.
What do these 100 predictions indicate though? Are these per pixel predictions? i.e. top pixels having the max values? But how do we evaluate these predictions? i.e. how do we plot these predictions on the images?

Similarly for training mode: The shape of p2 : [batch, 128, 384, channels] and therefore the shape is [batch, 128384, channels]. In create_gtscript, gt_creator function actually creates a tensor of `[128384]` size in this case. Does that mean it for every pixel?, i.e. loss is calculated for each pixel?

Please clear these doubts if possible.

Thank You.

yjh0410 commented 3 years ago

You can use the test.py code file to plot these predictions on the images. I creator targets for every pixels in P2 feature. For heatmap loss, it computes all the pixels' loss, but txty loss, twth loss and iou loss are computed on the positive samples(the pixels correspond the center points).

YashRunwal commented 3 years ago

Okay! So that's why it has the shape (128*284) in my case. I think I understood that now. @yjh0410 Can you please take a look here (https://github.com/yjh0410/CenterNet-plus/issues/9#issue-947032433) ? I have a doubt here in the gt_creator which is explained in the question. I think this is important as there will be a lot of scenarios wherein we cannot resize the image.