xingyizhou / CenterNet

Object detection, 3D detection, and pose estimation using center point detection:
MIT License
7.26k stars 1.93k forks source link

Ignore objects in loss calculation #964

Open r3krut opened 2 years ago

r3krut commented 2 years ago

Hi! Thanks for your outstanding work! I have a some question about loss calculation. Suppose, I have in my dataset some target objects which should be ignored during training. There is a right way to do this during heatmap loss calculation based on Gaussians?

Should I generate a binary training mask for pixels which belongs to "ignore" object for whole gaussian(positive+negative) or only for center(positive) pixel?

This is original image with targets visualization(red - targets, cyan - ignoring objects. oimg

This is source heatmaps for all objects. ohm

This mask for ignoring objects(drawing as circle with same radius as in heatmap generation) m

This is masked source heatmaps. mhm

My question is: is there a proper way to generate training mask for ignoring objects?

janaoravcova commented 2 years ago

@r3krut did you find a working way to do it? So far I was dealing with this by not calculating loss at those positions where I know the objects to be ignored are. But I am not sure about correctness of this solution.

r3krut commented 2 years ago

@janaoravcova I have found a similar way to ignore this type of bboxes as you described. I checked this approach for single class detection problem,. Maybe, it's will not work for multi-class detection problem.

def _cn_gen_targets(annots, img_h, img_w, r=cfg.r, num_classes=cfg.num_classes):
    """
        Generate targets from annots for CenterNet model(Heatmpas+Center offsets + Object sizes)
        Args:
            annots      :   ndarray of annotations in form of [x1, y1, x2, y2, class_id, ignore]
            img_h       :   image H
            img_w       :   image W
            r           :   stride
            num_classes :   number of classes
        Return:
            bboxes_heatmaps - heatmap with splat centers of bboxes
            bboxes_offests  - 
            bboxes_sizes    -
    """
    th = img_h // r                 #   size of target for H
    tw = img_w // r                 #   size of target for W

    bboxes_heatmaps = np.zeros((th, tw, num_classes), dtype=np.float32)
    centers_mask = np.zeros((th, tw), dtype=np.uint8)
    ignore_mask = np.zeros((th, tw), dtype=np.uint8)
    bboxes_offsets = np.zeros((th, tw, 2), dtype=np.float32)
    bboxes_sizes = np.zeros((th, tw, 2), dtype=np.float32)

    for annot in annots:
        bbox_x1 = annot[0]          #   x1
        bbox_y1 = annot[1]          #   y1
        bbox_x2 = annot[2]          #   x2
        bbox_y2 = annot[3]          #   y2
        cls_id = int(annot[4])      #   cls_id
        ignore = bool(annot[5])     #   1 if ignore, else 0

        aw = bbox_x2 - bbox_x1      #   width of bbox
        ah = bbox_y2 - bbox_y1      #   height of bbox
        bbox_center_x = bbox_x1 + aw / 2.0
        bbox_center_y = bbox_y1 + ah / 2.0

        aw_hm = aw / r
        ah_hm = ah / r
        gr = gaussian_radius((ah_hm, aw_hm), min_overlap=0.7)
        hm_pos = np.array([bbox_center_x // r, bbox_center_y // r], np.int32)   #   center position of bbox in HEATMAP
        hm_pos[0] = np.clip(hm_pos[0], 0, tw-1)
        hm_pos[1] = np.clip(hm_pos[1], 0, th-1)
        if not ignore:
            draw_gaussian(bboxes_heatmaps[:, :, cls_id-1], hm_pos, int(gr))
            bboxes_offsets[hm_pos[1], hm_pos[0], :] = np.array([bbox_center_x / r, bbox_center_y / r]) - hm_pos
            bboxes_sizes[hm_pos[1], hm_pos[0], :] = np.log(np.array([aw / r, ah / r]))  #   direct regression
            rr = int(gr)
            centers_mask = cv2.rectangle(centers_mask, (hm_pos[0]-rr, hm_pos[1]-rr), (hm_pos[0]+rr-1, hm_pos[1]+rr-1), 1, -1)
        else:
            ignore_mask = cv2.rectangle(ignore_mask, (int(bbox_x1//r), int(bbox_y1//r)), (int(bbox_x2//r), int(bbox_y2//r)), 1, -1)
    training_mask = (~(ignore_mask - (centers_mask & ignore_mask)).astype(np.bool)).astype(np.uint8)
    # training_mask = np.ones((th, tw), dtype=np.uint8)
    return numpy_to_tensor(bboxes_heatmaps).float(), \
            numpy_to_tensor(bboxes_offsets).float(), \
            numpy_to_tensor(bboxes_sizes).float(), \
            numpy_to_tensor(np.expand_dims(training_mask, axis=-1)).float()

In loss calculation:

def forward(self, gt_hms, gt_offsets, gt_sizes, 
                    pred_hms, pred_offsets, pred_sizes,
                    training_mask):
        gt_hms = gt_hms * training_mask
        pred_hms = pred_hms * training_mask
        gt_offsets = gt_offsets * training_mask
        pred_offsets = pred_offsets * training_mask
        gt_sizes = gt_sizes * training_mask
        pred_sizes = pred_sizes * training_mask
        .....
Runist commented 2 years ago

I think, maybe you should generate a mask which ignore image as well as label. Only ignore label will infuence network performance.