kuangliu / pytorch-retinanet

RetinaNet in PyTorch
992 stars 250 forks source link

Focal loss is very less #31

Open prakashjayy opened 6 years ago

prakashjayy commented 6 years ago

Hi,

I have implemented your code and it worked properly but have the following concerns

My sudo code works like this cls_targets = [batch_size, anchor_boxes, classes] # classes is 21 (voc_labels+background) [16, 67995, 21] cls_preds = [batch_size, anchor_boxes] # anchor_boxes number ranges from -1 to 20 [67995, 21]

Now I remove all the anchor boxes with -1 (ignore_boxes) cls_targets = [batch_size valid_anchor_boxes, classes] # [54933, 21] cls_preds = [batch_size valid_anchor_boxes, classes] # [54933, 21] This is one hot encoding vector

Now, I followed your code and implemented focal loss as it is but My loss values are coming very less. Like random values is giving a score of 0.12 and quickly the loss is going 0.0012 and small

is der I am missing something:

class FocalLoss_tensorflow(nn.Module):
    def __init__(self, num_classes=20,
                focusing_param = 2.0, 
                balance_param=0.25):
        super(FocalLoss_2, self).__init__()
        self.num_classes = num_classes
        self.focusing_param = focusing_param
        self.balance_param = balance_param 

    def focal_loss(self, x, y):
        """
        """
        x  = x[:, 1:]
        sigmoid_p = F.sigmoid(x)
        anchors, classes = x.shape 

        t = torch.FloatTensor(anchors, classes+1)
        t.zero_()
        t.scatter_(1, y.data.cpu().view(-1, 1), 1)
        t = Variable(t[:, 1:]).cuda()

        zeros = Variable(torch.zeros(sigmoid_p.size())).cuda()
        pos_p_sub = ((t >= sigmoid_p).float() * (t-sigmoid_p)) + ((t < sigmoid_p).float() * zeros)
        neg_p_sub = ((t >= zeros).float() * zeros) + ((t <= zeros).float() * sigmoid_p)

        per_entry_cross_ent = (-1) * self.balance_param * (pos_p_sub ** self.focusing_param) * torch.log(torch.clamp(sigmoid_p, 1e-8, 1.0)) -(1-self.balance_param) * (neg_p_sub ** self.focusing_param) * torch.log(torch.clamp(1.0-sigmoid_p, 1e-8, 1.0))
        return per_entry_cross_ent.mean()

    def forward(self, loc_preds, loc_targets, cls_preds, cls_targets):
        batch_size, num_boxes = cls_targets.size()
        pos = cls_targets > 0
        num_pos = pos.data.long().sum()

        mask = pos.unsqueeze(2).expand_as(loc_preds)
        masked_loc_preds = loc_preds[mask].view(-1,4)
        masked_loc_targets = loc_targets[mask].view(-1,4)
        loc_loss = F.smooth_l1_loss(masked_loc_preds, masked_loc_targets, size_average=False)
        loc_loss = loc_loss/num_pos

        pos_neg = cls_targets > -1
        mask = pos_neg.unsqueeze(2).expand_as(cls_preds)
        masked_cls_preds = cls_preds[mask].view(-1, self.num_classes)
        cls_loss = self.focal_loss(masked_cls_preds, cls_targets[pos_neg])
        return loc_loss, cls_loss

Question1: I am still not getting quite write, if I should use 0 as my background class and how normalization is done while focal loss is applied.

jeong-tae commented 6 years ago

does test loss also very less? Focal loss can be very less by compressing a easy classes but it doesn't mean that not working. I am also trying to reproduce with VOC dataset.

Q1. => I think you should count the # of instances including background class to normalize.

jeong-tae commented 4 years ago

of course long time ago, I tried. I remember that performance was poor on VOC dataset.

I can’t remember exact numbers... hm, it was around 40mAP or 60mAP. haha I know the gab between two are large lol. It’s almost 1 years ago. plz understand.

jeong-tae commented 4 years ago

@EvanAlbee x = x[:, 1:] doesn't mean that exclude all negative matches. Negative can be defined "predicted as positives but true labels are background cases" in here. Well, we still consider x[:, 0] as negative samples but may useless to reduce false positives.