SIOU caused Nan result - Githubissues

lucasjinreal commented 2 years ago

Hi, I applied SIOU in yolov6 but after about 700 epoch get Nan values in iou loss, does there any possible reason?

if self.iou_type == "giou":
            c_area = cw * ch + self.eps  # convex area
            iou = iou - (c_area - union) / c_area
        elif self.iou_type in ["diou", "ciou"]:
            c2 = cw**2 + ch**2 + self.eps  # convex diagonal squared
            rho2 = (
                (b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2
                + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2
            ) / 4  # center distance squared
            if self.iou_type == "diou":
                iou = iou - rho2 / c2
            elif self.iou_type == "ciou":
                v = (4 / math.pi**2) * torch.pow(
                    torch.atan(w2 / h2) - torch.atan(w1 / h1), 2
                )
                with torch.no_grad():
                    alpha = v / (v - iou + (1 + self.eps))
                iou = iou - (rho2 / c2 + v * alpha)
        elif self.iou_type == "siou":
            # SIoU Loss https://arxiv.org/pdf/2205.12740.pdf
            s_cw = (b2_x1 + b2_x2 - b1_x1 - b1_x2) * 0.5
            s_ch = (b2_y1 + b2_y2 - b1_y1 - b1_y2) * 0.5
            sigma = torch.pow(s_cw**2 + s_ch**2, 0.5)
            sin_alpha_1 = torch.abs(s_cw) / sigma
            sin_alpha_2 = torch.abs(s_ch) / sigma
            threshold = pow(2, 0.5) / 2
            sin_alpha = torch.where(sin_alpha_1 > threshold, sin_alpha_2, sin_alpha_1)
            angle_cost = torch.cos(torch.arcsin(sin_alpha) * 2 - math.pi / 2)
            rho_x = (s_cw / cw) ** 2
            rho_y = (s_ch / ch) ** 2
            gamma = angle_cost - 2
            distance_cost = 2 - torch.exp(gamma * rho_x) - torch.exp(gamma * rho_y)
            omiga_w = torch.abs(w1 - w2) / torch.max(w1, w2)
            omiga_h = torch.abs(h1 - h2) / torch.max(h1, h2)
            shape_cost = torch.pow(1 - torch.exp(-1 * omiga_w), 4) + torch.pow(
                1 - torch.exp(-1 * omiga_h), 4
            )
            iou = iou - 0.5 * (distance_cost + shape_cost)
        loss = 1.0 - iou

my box is also cycywh

serser commented 2 years ago

Hi, it will be useful to save the intermediate inputs to reproduce what has lead to nan results (a try catch will do). I've met a similar problem w/ ciou loss when the gt box of some samples is oversized. My walkthrough was (only when nan is occasional)

loss = torch.nan_to_num(loss)

Just curious, what is the possible gain for 700 epochs training?

shensheng272 commented 2 years ago

V0.2.0 code use tal and different iou loss strategy. Training will be more stable.

lucasjinreal commented 2 years ago

@shensheng272 hi, what's new updates on v0.2.0 specifically？

shensheng272 commented 2 years ago

@jinfagang In short: What new. Also tech report has been released.

meituan / YOLOv6

SIOU caused Nan result #417