Open YooshinCho opened 6 years ago
Hi @YooshinCho, if you want to use sgd as the optimizer, you need to use gradient clipping, otherwise the gradient will be too large and result in the vanishing gradient issue (the loss will go inf as you mentioned). Check out nn.utils.clip_grad_norm for gradient clipping.
@YooshinCho I met the same problem as yours.
I gave up reproducing with SGD optimizer...
I tried to reproduce LAPSRN just same as paper in CVPR, so I changed optim to SGD. Then, it makes loss go inf. So, I changed loss term divided by batchsize as paper mentioned, but still loss go inf. Do you have any idea?