Loss is coming nan - Githubissues

ridhajuneja123 commented 5 years ago

both loc_loss and cls_loss are coming nan can u suggest the solution

ResearchingDexter commented 5 years ago

the condition may be caused by the size of anchors that anchors'size can't match your detected objects

xhtian95 commented 5 years ago

I also meet this problem......

ResearchingDexter commented 5 years ago

both loc_loss and cls_loss are coming nan can u suggest the solution

ResearchingDexter commented 5 years ago

I suppose u can print the number of positive example and u can adjust the ratios of the anchor according the number

liqima commented 5 years ago

No description provided.

did you solved this problem?

anotherother commented 5 years ago

Who have such problem? May someone recommend the solution? In my case it seems like this: `loc_loss: 0.086 | cls_loss: 673.821 | Train_loss: 673.90753 | avg_loss: 673.90753

loc_loss: 0.088 | cls_loss: 540.022 | Train_loss: 540.11029 | avg_loss: 607.00891

loc_loss: 0.081 | cls_loss: 589.325 | Train_loss: 589.40613 | avg_loss: 601.14132

loc_loss: 0.081 | cls_loss: 418.840 | Train_loss: 418.92139 | avg_loss: 555.58633

loc_loss: 0.083 | cls_loss: 268.827 | Train_loss: 268.90982 | avg_loss: 498.25103

loc_loss: 0.086 | cls_loss: 211.607 | Train_loss: 211.69376 | avg_loss: 450.49149

loc_loss: 0.106 | cls_loss: 71.394 | Train_loss: 71.49988 | avg_loss: 396.34983

loc_loss: 0.075 | cls_loss: 28.076 | Train_loss: 28.15103 | avg_loss: 350.32498

loc_loss: 0.088 | cls_loss: 19.801 | Train_loss: 19.88938 | avg_loss: 313.60991

loc_loss: 0.086 | cls_loss: 12.623 | Train_loss: 12.70911 | avg_loss: 283.51983

loc_loss: 0.092 | cls_loss: inf | Train_loss: inf | avg_loss: inf

loc_loss: nan | cls_loss: nan | Train_loss: nan | avg_loss: nan

loc_loss: nan | cls_loss: nan | Train_loss: nan | avg_loss: nan`

anotherother commented 5 years ago

Problem was solved. This is rewritten code



from __future__ import print_function

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

class FocalLoss(nn.Module):
    def __init__(self, num_classes):
        super(FocalLoss, self).__init__()
        self.num_classes = num_classes

    def _one_hot_embeding(self, labels):
        """Embeding labels to one-hot form.
        Args:
            labels(LongTensor): class labels
            num_classes(int): number of classes
        Returns:
            encoded labels, sized[N, #classes]
        """

        y = torch.eye(self.num_classes+1)  # [D, D]
        return y[labels]  # [N, D]

    def focal_loss(self, x, y):
        """Focal loss
        Args:
            x(tensor): size [N, D]
            y(tensor): size [N, ]
        Returns:
            (tensor): focal loss
        """

        alpha = 0.25
        gamma = 2

        t = self._one_hot_embeding(y.data.cpu())  # [N,21]
        t = t[:, 1:]  # exclude background
        t = Variable(t).cuda()  # [N,20]

        logit = F.softmax(x)
        logit = logit.clamp(1e-7, 1.-1e-7)
        conf_loss_tmp = -1 * t.float() * torch.log(logit)
        conf_loss_tmp = alpha * conf_loss_tmp * (1-logit)**gamma
        conf_loss = conf_loss_tmp.sum()

        return conf_loss

    def forward(self, loc_preds, loc_targets, cls_preds, cls_targets):
        """Compute loss between (loc_preds, loc_targets) and (cls_preds, cls_targets).
        Args:
          loc_preds(tensor): predicted locations, sized [batch_size, #anchors, 4].
          loc_targets(tensor): encoded target locations, sized [batch_size, #anchors, 4].
          cls_preds(tensor): predicted class confidences, sized [batch_size, #anchors, #classes].
          cls_targets(tensor): encoded target labels, sized [batch_size, #anchors].
        Returns:
          (tensor) loss = SmoothL1Loss(loc_preds, loc_targets) + FocalLoss(cls_preds, cls_targets).
        """

        pos = cls_targets > 0  # [N,#anchors]
        num_pos = pos.data.long().sum()

        # loc_loss = SmoothL1Loss(pos_loc_preds, pos_loc_targets)
        mask = pos.unsqueeze(2).expand_as(loc_preds)  # [N,#anchors,4]
        masked_loc_preds = loc_preds[mask].view(-1, 4)  # [#pos,4]
        masked_loc_targets = loc_targets[mask].view(-1, 4)  # [#pos,4]
        loc_loss = F.smooth_l1_loss(masked_loc_preds, masked_loc_targets, size_average=False)

        # cls_loss = FocalLoss(loc_preds, loc_targets)
        pos_neg = cls_targets > -1  # exclude ignored anchors
        # num_pos_neg = pos_neg.data.long().sum()
        mask = pos_neg.unsqueeze(2).expand_as(cls_preds)
        masked_cls_preds = cls_preds[mask].view(-1, self.num_classes)

        cls_loss = self.focal_loss(masked_cls_preds, cls_targets[pos_neg])
        num_pos = max(1.0, num_pos.item())

        print('loc_loss: %.3f | cls_loss: %.3f' % (loc_loss.item() / num_pos, cls_loss.item() / num_pos), end=' | ')

        loss = loc_loss / num_pos + cls_loss / num_pos

        return loss
`

heartInsert commented 5 years ago

Thanks, It is worked. But can you tell me which statement did you change? I try to but can't find it. Thank you @miramind