Learning rate - Githubissues

jiwoon-ahn / psa

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

MIT License

380 stars 62 forks source link

Learning rate #9

Open DQDH opened 5 years ago

DQDH commented 5 years ago

I don't know how to get the parameters of the Ours-ResNet segmentation network. Can you give a explain for the parameters ?Thanks.

LeiyuanMa commented 5 years ago

I tried change the learning rate to 0.01,and the batchsize 4,the loss is decreased to 0.0403,only within one epoch(Iter:37000/39675,a epoch almost finised but failed),but the program often cause a error like this: validating ... terminate called after throwing an instance of 'std::system_error' what(): Operation not permitted

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

So,is there any parameters need to change?or any advice on the error?

LeiyuanMa commented 5 years ago

I tried change the learing rate to 0.01,and batchsize=4,and due to the limit of GPU resource,I set model = torch.nn.DataParallel(model,device_ids=[0]),but after there is alwayes a error like this: Iter:36900/39675 Loss:0.0413 imps:3.5 Fin:Mon Oct 22 03:56:00 2018 lr: 0.0009 Iter:36950/39675 Loss:0.0363 imps:3.5 Fin:Mon Oct 22 03:55:56 2018 lr: 0.0009 Iter:37000/39675 Loss:0.0403 imps:3.5 Fin:Mon Oct 22 03:55:53 2018 lr: 0.0009

validating ... terminate called after throwing an instance of 'std::system_error' what(): Operation not permitted

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

the program can't finish a epoch,but the loss is decreased to 0.0403,is it accepted or need more epoch? Do you have any advise on this error?

jiwoon-ahn commented 5 years ago

@hardBird123, I trained by Adam setting initial learning rate as 0.001. But I didn't try to find the optimal learning rate. You can get better results than mine by adopting SGD or just following the method described in https://arxiv.org/pdf/1611.10080.pdf.

jiwoon-ahn commented 5 years ago

@LeiyuanMa, Sorry, I can't help you with that error. Probably related to the memory leak. In my case, training epochs do not change the performance a lot. And I haven't tested training 15 epochs is the best for the network.

DQDH commented 5 years ago

ok, thanks. I want to confirm that the weights [ilsvrc-cls_rna-a1_cls1000_ep-0001.params] is the pretrained weights for training segmentation network ResNet38?

jiwoon-ahn commented 5 years ago

@hardBird123, Yes, that is the right file for the segmentation network.

LeiyuanMa commented 5 years ago

thanks,so is the loss=0.0403 acceptable?

DQDH commented 5 years ago

which lr_type(fixed(default)/step/linear) should I choose when training the ResNet38 segmentation network？

suoranxiu commented 5 years ago

hello, I'm a student who running this code. And there is a running error. Can you give me some tips about this issue. 2019-02-24 204758

RuntimeError: size mismatch, m1: [1 x 20], m2: [1 x 20]