Open liuchuanloong opened 6 years ago
I try gradient clipping but did not work, how initial parameters except resnet layer ? defaut pytorch?
Be default my refinenet implementation uses torchvision.models.resnet101
for the resnet layer.
Make a plot of the gradients after each batch and see if they are exploding. Also plotting the loss after each batch would help in debugging your issue.
@liuchuanloong Have you solved this problem?
@zhouyuangan sorry, I just changed another scheme
FYI. I also encountered the problem for loss not converging when I trained the model for 2 classes, and I deal with it just by decreasing the initial learning rate to 1e-4 or 5e-5.
In my experiments, the learning rate 1e-5 seems too high to ensure stable loss convergence, where after several epochs the DiceLoss would increase to 1.00 and then does not decrease any more. Maybe a much smaller learning rate, e.g., 5e-6, can relieve this problem to some extent.
信件已收到:)