Open cydiachen opened 2 years ago
Hi, if the first 5 epochs are warm-up epochs, you may set a lower learning rate. The 'Inf' value problem is possibly caused by some very large negative predictions, say -100000000, and this will lead to log(sigmoid(p)) -> Inf
.
Thank you for your excellent work. I am now experiment on improving VFNet with the latest model backbone. (e,g. Poolformer S36, ConvNeXt Small) The network works fine on the first 5 epochs and suffer from significant performance drop caused by unexpected Inf value of cls_loss ( In my case is varifocal loss). I am hoping for getting some advice for tracking the issue. (I have tried grad_clip to clip gradient of Inf value, but it does not solve the issue)