I have two questions:1) I train on my dataset for implement 2 classes when last layer sigmoid activate for binary cross entropy ,but the loss didn't convergence, 2) and the output shape is (N, C, 32, 32) when input shape is (N,C,128,128), can I unsample output to mask size（128,128）？

thomasjpfan / pytorch_refinenet

Pytorch Implementation of Refinenet

MIT License

160 stars 33 forks source link

I have two questions:1) I train on my dataset for implement 2 classes when last layer sigmoid activate for binary cross entropy ,but the loss didn't convergence, 2) and the output shape is (N, C, 32, 32) when input shape is (N,C,128,128), can I unsample output to mask size（128,128）？ #6

Open liuchuanloong opened 6 years ago

thomasjpfan commented 6 years ago

There are a few reasons for losses not converging. Check if your gradients are misbehaving, if they are you can use gradient clipping to fix it.
torch.nn.functional.interpolate scales tensors for you.

liuchuanloong commented 6 years ago

I try gradient clipping but did not work， how initial parameters except resnet layer ？ defaut pytorch？

thomasjpfan commented 6 years ago

Be default my refinenet implementation uses torchvision.models.resnet101 for the resnet layer.

thomasjpfan commented 6 years ago

Make a plot of the gradients after each batch and see if they are exploding. Also plotting the loss after each batch would help in debugging your issue.

ygean commented 5 years ago

@liuchuanloong Have you solved this problem?

liuchuanloong commented 5 years ago

@zhouyuangan sorry, I just changed another scheme

kingyj7 commented 5 years ago

FYI. I also encountered the problem for loss not converging when I trained the model for 2 classes, and I deal with it just by decreasing the initial learning rate to 1e-4 or 5e-5.

jingyzhang commented 1 year ago

In my experiments, the learning rate 1e-5 seems too high to ensure stable loss convergence, where after several epochs the DiceLoss would increase to 1.00 and then does not decrease any more. Maybe a much smaller learning rate, e.g., 5e-6, can relieve this problem to some extent.

kingyj7 commented 1 year ago

信件已收到:)

thomasjpfan / pytorch_refinenet

I have two questions:1) I train on my dataset for implement 2 classes when last layer sigmoid activate for binary cross entropy ,but the loss didn't convergence, 2) and the output shape is (N, C, 32, 32) when input shape is (N,C,128,128), can I unsample output to mask size（128,128） ？ #6

I have two questions:1) I train on my dataset for implement 2 classes when last layer sigmoid activate for binary cross entropy ,but the loss didn't convergence, 2) and the output shape is (N, C, 32, 32) when input shape is (N,C,128,128), can I unsample output to mask size（128,128）？ #6