Closed khg2478 closed 3 years ago
The NaN loss problem has been fixed, I tested it on yolov4-p5 and yolov4-tiny, I haven't tested it on yolov4-p7 (Because My gpu don't have enough memory) . You can try it. Let me know if there is a problem.
@wangermeng2021 I appreciate your well refined source code and prompt reply. I have saved lots of time. :) I do know that NaN loss can be caused by various issues and was thinking whether I should fix some parts of the model itself or not. Could you elaborate how you determined 1e-07 addition to atan2 function? Did you print out when are the cases that cause NaN specifically? and determine the digit?
When I see NaN error, My first guess is "divided by zero". so I check the loss output. I found the prediction(pred_wh) occasionally equal zero. so I determined 1e-07 addition to atan2. Debug outputs: pred_wh:tf.Tensor( [[0. 0.2800796 ] [0. 3.9574223 ] [0. 0.25130844] ... [0. 2.1104202 ] [0. 3.933052 ] [0. 1.6214828 ]], shape=(983, 2), dtype=float32)
I see. It sounds right. :) Thanks a lot! I’ll close the case. Have a great weekend!
Hello.
I've tried to train your YOLOv4-P7 model and previously was facing NaN error.
I have observed that you have updated loss function slightly to fix the issue and wonder whether the issue has been resolved, as mAP scores of P6 and P7 haven't been updated yet.
Has the issue been solved??