Open John1231983 opened 2 years ago
Not currently supported --fp16 = True
This is log when I disable fp16
2022-08-13 09:30:53 | INFO | yolox.core.trainer:280 - epoch: 1/300, iter: 10/171, mem: 4942Mb, iter_time: 0.415s, data_time: 0.002s, total_loss: 290979552.0, iou_loss: 4.3, l1_loss: 0.0, conf_loss: 39.4, cls_loss: 1.6, lr: 6.840e-07, size: 416, ETA: 5:54:54
2022-08-13 09:30:58 | INFO | yolox.core.trainer:280 - epoch: 1/300, iter: 20/171, mem: 4942Mb, iter_time: 0.468s, data_time: 0.004s, total_loss: 1984544256.0, iou_loss: 4.1, l1_loss: 0.0, conf_loss: 15.3, cls_loss: 2.0, lr: 2.736e-06, size: 352, ETA: 6:17:12
2022-08-13 09:31:01 | INFO | yolox.core.trainer:280 - epoch: 1/300, iter: 30/171, mem: 4942Mb, iter_time: 0.358s, data_time: 0.005s, total_loss: 223795872.0, iou_loss: 4.5, l1_loss: 0.0, conf_loss: 28.7, cls_loss: 1.2, lr: 6.156e-06, size: 352, ETA: 5:53:21
2022-08-13 09:31:09 | INFO | yolox.core.trainer:280 - epoch: 1/300, iter: 40/171, mem: 18032Mb, iter_time: 0.819s, data_time: 0.003s, total_loss: 1283537536.0, iou_loss: 4.5, l1_loss: 0.0, conf_loss: 46.6, cls_loss: 1.4, lr: 1.094e-05, size: 576, ETA: 7:19:58
Note that, I commented the line https://github.com/xiao-hua-sheng/YOLOX-Distill/blob/f6907979daed2683076a3bc55770bc06c411f70f/yolox/models/yolo_head.py#L292
@xiao-hua-sheng could you please give some comment?
I trained the code with --fp16 True and distill loss becomes NaN. Any suggestion to fix it? I am using pytorch 1.12