wangermeng2021 / Scaled-YOLOv4-tensorflow2

A Tensorflow2.x implementation of Scaled-YOLOv4 as described in Scaled-YOLOv4: Scaling Cross Stage Partial Network
Apache License 2.0
47 stars 18 forks source link

YOLOv4-P7 Nan error #11

Closed khg2478 closed 3 years ago

khg2478 commented 3 years ago

Hello.

I've tried to train your YOLOv4-P7 model and previously was facing NaN error.

I have observed that you have updated loss function slightly to fix the issue and wonder whether the issue has been resolved, as mAP scores of P6 and P7 haven't been updated yet.

Has the issue been solved??

wangermeng2021 commented 3 years ago

The NaN loss problem has been fixed, I tested it on yolov4-p5 and yolov4-tiny, I haven't tested it on yolov4-p7 (Because My gpu don't have enough memory) . You can try it. Let me know if there is a problem.

khg2478 commented 3 years ago

@wangermeng2021 I appreciate your well refined source code and prompt reply. I have saved lots of time. :) I do know that NaN loss can be caused by various issues and was thinking whether I should fix some parts of the model itself or not. Could you elaborate how you determined 1e-07 addition to atan2 function? Did you print out when are the cases that cause NaN specifically? and determine the digit?

wangermeng2021 commented 3 years ago

When I see NaN error, My first guess is "divided by zero". so I check the loss output. I found the prediction(pred_wh) occasionally equal zero. so I determined 1e-07 addition to atan2. Debug outputs: pred_wh:tf.Tensor( [[0. 0.2800796 ] [0. 3.9574223 ] [0. 0.25130844] ... [0. 2.1104202 ] [0. 3.933052 ] [0. 1.6214828 ]], shape=(983, 2), dtype=float32)

khg2478 commented 3 years ago

I see. It sounds right. :) Thanks a lot! I’ll close the case. Have a great weekend!