xiao-hua-sheng / YOLOX-Distill

yolox 基于目标知识蒸馏
Apache License 2.0
5 stars 1 forks source link

Loss distill NaN #5

Open John1231983 opened 2 years ago

John1231983 commented 2 years ago

I trained the code with --fp16 True and distill loss becomes NaN. Any suggestion to fix it? I am using pytorch 1.12

xiao-hua-sheng commented 2 years ago

Not currently supported --fp16 = True

John1231983 commented 2 years ago

This is log when I disable fp16

2022-08-13 09:30:53 | INFO     | yolox.core.trainer:280 - epoch: 1/300, iter: 10/171, mem: 4942Mb, iter_time: 0.415s, data_time: 0.002s, total_loss: 290979552.0, iou_loss: 4.3, l1_loss: 0.0, conf_loss: 39.4, cls_loss: 1.6, lr: 6.840e-07, size: 416, ETA: 5:54:54
2022-08-13 09:30:58 | INFO     | yolox.core.trainer:280 - epoch: 1/300, iter: 20/171, mem: 4942Mb, iter_time: 0.468s, data_time: 0.004s, total_loss: 1984544256.0, iou_loss: 4.1, l1_loss: 0.0, conf_loss: 15.3, cls_loss: 2.0, lr: 2.736e-06, size: 352, ETA: 6:17:12
2022-08-13 09:31:01 | INFO     | yolox.core.trainer:280 - epoch: 1/300, iter: 30/171, mem: 4942Mb, iter_time: 0.358s, data_time: 0.005s, total_loss: 223795872.0, iou_loss: 4.5, l1_loss: 0.0, conf_loss: 28.7, cls_loss: 1.2, lr: 6.156e-06, size: 352, ETA: 5:53:21
2022-08-13 09:31:09 | INFO     | yolox.core.trainer:280 - epoch: 1/300, iter: 40/171, mem: 18032Mb, iter_time: 0.819s, data_time: 0.003s, total_loss: 1283537536.0, iou_loss: 4.5, l1_loss: 0.0, conf_loss: 46.6, cls_loss: 1.4, lr: 1.094e-05, size: 576, ETA: 7:19:58

Note that, I commented the line https://github.com/xiao-hua-sheng/YOLOX-Distill/blob/f6907979daed2683076a3bc55770bc06c411f70f/yolox/models/yolo_head.py#L292

John1231983 commented 2 years ago

@xiao-hua-sheng could you please give some comment?