koichiro11 / residual-attention-network

94 stars 24 forks source link

have test on cifar-10? #1

Open tengshaofeng opened 6 years ago

tengshaofeng commented 6 years ago

hi, @koichiro11 i am so appreciated with your great job. have you test the model in cifar-10, what is the result?

lipond commented 6 years ago

@tengshaofeng I also found the training issue, have you fixed it yet ?

tengshaofeng commented 6 years ago

@lipond , i try to modify his code to test on cifar test set of 10000 samples.

lipond commented 6 years ago

@tengshaofeng what is the result? Does your training process seems normal? I run the trainning codes but got fixed cost and apparently it is with problems.

koichiro11 commented 6 years ago

@tengshaofeng @lipond thank you for giving me comments. I tested the model in cifar-10 for debug and it worked, but I didn't save the result... Could you tell me the problem so that I can fix the code?

xiaoganghan commented 6 years ago

@koichiro11 The training log as below seems like not normal, right? Thanks.

EPOCH: 0, Training cost: 2.36071372032, Validation cost: 2.36695027351, Validation Accuracy: 0.0942 EPOCH: 5, Training cost: 2.36067867279, Validation cost: 2.36695027351, Validation Accuracy: 0.0942 EPOCH: 10, Training cost: 2.36069869995, Validation cost: 2.36695027351, Validation Accuracy: 0.0942

koichiro11 commented 6 years ago

@xiaoganghan hmm... As you said, the training log is not good. I'm not sure but maybe it is because the number of attention module is too many (in my experience).

I had tested cifar-10 and the training process had been good. I will check my code and train again when I have time.

xiaoganghan commented 6 years ago

@koichiro11 Thank you for the reply. it's wired. The above log is output from your lastest code without any changes. It's on cifar-10 for sure.

I want to train it on cifar-10 and do some visualization on the masks to see how well the residual attention model works. In this case, do you think it's training parameter issue or attention module issue? Is the latest commit the version you used to train on cifar-10 successfully? Or is there any previous commits I should try? I only want to train on cifar-10 any. Thank you again for your prompt reply.

tengshaofeng commented 6 years ago

@lipond , i have test on 10000 test samples in cifar-10, it just have the accuracy of 87%

xiaoganghan commented 6 years ago

@tengshaofeng what changes have you made to achieve 87% accuracy? Thank you.

josianerodrigues commented 6 years ago

@xiaoganghan, @tengshaofeng, can you provide your test script? I have made one but I did get a very low accuracy. Thank you.

tengshaofeng commented 6 years ago

sorry, the result is not from this code. it is from another pytorch project. the following is the result: Accuracy of the model on the test images: 86 % Accuracy of plane : 88 % Accuracy of car : 93 % Accuracy of bird : 79 % Accuracy of cat : 74 % Accuracy of deer : 85 % Accuracy of dog : 79 % Accuracy of frog : 89 % Accuracy of horse : 90 % Accuracy of ship : 92 % Accuracy of truck : 91 %

tengshaofeng commented 6 years ago

@xiaoganghan , i also meet the problem your said, loss do not decrease. @lipond , @josianerodrigues you can refers my pytorch code project: https://github.com/tengshaofeng/ResidualAttentionNetwork-pytorch

josianerodrigues commented 6 years ago

@tengshaofeng Thank you for the reply and for share your project with us.

tengshaofeng commented 6 years ago

@josianerodrigues , my presure.

josianerodrigues commented 6 years ago

Hi @tengshaofeng, how long does this code take to run on average? Sorry for taking your time and thank you for considering my doubt.

tengshaofeng commented 6 years ago

this code about 7 minutes every 5 epochs. the following is the training log:

log start

start to train ResidualAttentionModel load CIFAR-10 data... load data from pickle build graph... check shape of data... train_X: (45000, 32, 32, 3) train_y: (45000, 10) start to train... EPOCH: 0, Training cost: 2.3612446785, Validation cost: 2.36055040359, Validation Accuracy: 0.1006 save model... EPOCH: 5, Training cost: 2.36119961739, Validation cost: 2.36055040359, Validation Accuracy: 0.1006 save model... EPOCH: 10, Training cost: 2.36124396324, Validation cost: 2.36055040359, Validation Accuracy: 0.1006 save model... EPOCH: 15, Training cost: 2.36119961739, Validation cost: 2.36055040359, Validation Accuracy: 0.1006 save model... EPOCH: 20, Training cost: 2.36122179031, Validation cost: 2.36055040359, Validation Accuracy: 0.1006 save model... EPOCH: 25, Training cost: 2.36119961739, Validation cost: 2.36055040359, Validation Accuracy: 0.1006 save model... EPOCH: 30, Training cost: 2.36122179031, Validation cost: 2.36055040359, Validation Accuracy: 0.1006 save model... save model...

log end

it seems not converaged.

josianerodrigues commented 6 years ago

I got the same result, it did not really converge. I was referring to the run time of pytorch implementation that you shared with us.

tengshaofeng commented 6 years ago

Just now I rerun my code, the following is the train log of my pytorch implementation:

log

for res_att_92 net: Epoch [1/100], Iter [100/1429] Loss: 2.1525 Epoch [1/100], Iter [200/1429] Loss: 1.7000 Epoch [1/100], Iter [300/1429] Loss: 1.7273 Epoch [1/100], Iter [400/1429] Loss: 1.4131 Epoch [1/100], Iter [500/1429] Loss: 1.5592 Epoch [1/100], Iter [600/1429] Loss: 1.6161 Epoch [1/100], Iter [700/1429] Loss: 1.3315 Epoch [1/100], Iter [800/1429] Loss: 1.0377 Epoch [1/100], Iter [900/1429] Loss: 1.3492 Epoch [1/100], Iter [1000/1429] Loss: 1.3490 Epoch [1/100], Iter [1100/1429] Loss: 1.3188 Epoch [1/100], Iter [1200/1429] Loss: 1.3300 Epoch [1/100], Iter [1300/1429] Loss: 1.1882 Epoch [1/100], Iter [1400/1429] Loss: 0.9603 the epoch takes time: 1051.79760003 Epoch [2/100], Iter [100/1429] Loss: 0.9891 Epoch [2/100], Iter [200/1429] Loss: 1.2262 Epoch [2/100], Iter [300/1429] Loss: 0.9173 Epoch [2/100], Iter [400/1429] Loss: 1.1978 Epoch [2/100], Iter [500/1429] Loss: 0.9160 Epoch [2/100], Iter [600/1429] Loss: 0.8897 Epoch [2/100], Iter [700/1429] Loss: 0.7859 Epoch [2/100], Iter [800/1429] Loss: 0.8977 Epoch [2/100], Iter [900/1429] Loss: 0.6515 Epoch [2/100], Iter [1000/1429] Loss: 0.9553 Epoch [2/100], Iter [1100/1429] Loss: 0.9544 Epoch [2/100], Iter [1200/1429] Loss: 1.2661 Epoch [2/100], Iter [1300/1429] Loss: 0.9071 Epoch [2/100], Iter [1400/1429] Loss: 0.7281 the epoch takes time: 1053.76822901

for the res_att_56 net: Epoch [1/100], Iter [100/1429] Loss: 1.9677 Epoch [1/100], Iter [200/1429] Loss: 1.7845 Epoch [1/100], Iter [300/1429] Loss: 1.7899 Epoch [1/100], Iter [400/1429] Loss: 1.7015 Epoch [1/100], Iter [500/1429] Loss: 1.4097 Epoch [1/100], Iter [600/1429] Loss: 1.4999 Epoch [1/100], Iter [700/1429] Loss: 1.2078 Epoch [1/100], Iter [800/1429] Loss: 1.4107 Epoch [1/100], Iter [900/1429] Loss: 1.6492 Epoch [1/100], Iter [1000/1429] Loss: 1.8750 Epoch [1/100], Iter [1100/1429] Loss: 1.7730 Epoch [1/100], Iter [1200/1429] Loss: 1.3797 Epoch [1/100], Iter [1300/1429] Loss: 1.2181 Epoch [1/100], Iter [1400/1429] Loss: 1.3505 the epoch takes time: 654.214586973 Epoch [2/100], Iter [100/1429] Loss: 1.1204 Epoch [2/100], Iter [200/1429] Loss: 1.7548 Epoch [2/100], Iter [300/1429] Loss: 1.3137 Epoch [2/100], Iter [400/1429] Loss: 1.0649 Epoch [2/100], Iter [500/1429] Loss: 0.9719 Epoch [2/100], Iter [600/1429] Loss: 1.2086 Epoch [2/100], Iter [700/1429] Loss: 0.9056 Epoch [2/100], Iter [800/1429] Loss: 0.8379 Epoch [2/100], Iter [900/1429] Loss: 0.6485 Epoch [2/100], Iter [1000/1429] Loss: 0.8086 Epoch [2/100], Iter [1100/1429] Loss: 0.9019 Epoch [2/100], Iter [1200/1429] Loss: 0.9073 Epoch [2/100], Iter [1300/1429] Loss: 0.9322 Epoch [2/100], Iter [1400/1429] Loss: 1.1767 the epoch takes time: 658.405325174

log end

so it takes about 1000 and 650 seconds one epoch for res_att_92 and res_att_56 respectively, and the train batch size is set as 35.

josianerodrigues commented 6 years ago

Does the log show empty? How long does one epoch run?

tengshaofeng commented 6 years ago

@josianerodrigues , i am runing the code just now , the answer is the above.

josianerodrigues commented 6 years ago

Thanks for your help, @tengshaofeng :)

koichiro11 commented 6 years ago

@tengshaofeng @lipond @xiaoganghan @josianerodrigues thank you for giving me comments and discussing. Now I fixed the code and created pull request. The reason why loss doesn't reduce is softmax function. The output from final FC layer is relatively large, so I introduce layer normalization before FC layer. Now the loss reduces well, but I don't know why the output from final FC layer is relatively large (actually, there is no mention like this, including the necessity of layer normalizaton, in the paper). if you can find the reason or error, please tell me. Thanks again.

josianerodrigues commented 6 years ago

@koichiro11 Thank you for the corrections and to share with us.

josianerodrigues commented 6 years ago

@koichiro11, Could you please share the tensorflow, keras and python versions that you used to run this implementation?

tengshaofeng commented 6 years ago

hi every body, @lipond @koichiro11 @xiaoganghan @josianerodrigues I modified my code at https://github.com/tengshaofeng/ResidualAttentionNetwork-pytorch . change the input with 3232 instead of 224224, also build a new architecture for cifar10 called ResidualAttentionModel_92_32input. and get the new result on cifar10 test set : Accuracy of the model on the test images: 92.66% I am sure you can do better based my code, because there are some tricks to try.

josianerodrigues commented 6 years ago

Hi @tengshaofeng, I would like to use the batch size 64, but the memory is overflowing. Is there any optimization that can be done so it does not happen?

tengshaofeng commented 6 years ago

@josianerodrigues ,reduce the batchsize until overflowing do not happen. or you can use multi-gpus with Distributed training. or you can only use the the network with less parameters like res_att_56.

tengshaofeng commented 6 years ago

@lipond @koichiro11 @xiaoganghan @josianerodrigues hi, everyone, i modify my code when decaying learning rate. At last version of my code, i modify my optimizer to sgd as paper said, but i have not modify it when decaying learning rate. So the newest result now on cifar10 test set is accuracy of 0.9354.
code: https://github.com/tengshaofeng/ResidualAttentionNetwork-pytorch

josianerodrigues commented 6 years ago

@tengshaofeng thank you for share with us.

koichiro11 commented 6 years ago

@josianerodrigues i use python3.6, tensorflow 1.4, keras 2.0.8.

koichiro11 commented 6 years ago

@tengshaofeng thank you for sharing your code. Now I am fixing the model. In your code ResidualAttentionNetwork-pytorch/Residual-Attention-Network/model/attention_module.py, you add not only output of residual unit in skip-connection but also output of first residual unit ex:

422l:  out_interp = self.interpolation1(out_middle_2r_blocks) + out_down_residual_blocks1

Is it intentional? and is it effective?

tengshaofeng commented 6 years ago

@koichiro11,yes it is intentional. I reference the caffe project. You can remove the addition to try if it is effective.

NatalieZou commented 6 years ago

@koichiro11 Hi! Can you share your results on cifar10 with us? Does this code reach the accuracy in the paper?

alyato commented 6 years ago

@tengshaofeng when i run this code, it show me AttributeError: 'UpSampling2D' object has no attribute 'outbound_nodes' i find that the #4 Do u give me some suggestions? thanks.

tengshaofeng commented 6 years ago

@alyato , sorry, i have not met that problem.

alyato commented 6 years ago

@tengshaofeng Thanks. I dont know if the version is wrong.

tensorflow 1.4.0 keras 2.1.4 python 2.7

and some guys meet the same issue #4

sankin1770 commented 6 years ago

I used the tensorflow version of the code to run to the 56th epoch and automatically ended ,the validation accuracy is just 80%, I don't know why.......................

PayneJoe commented 6 years ago

Hi, @alyato

Have you fixed this problem? I have the same problem with python 3.6, tensorflow 1.10.0. Let me know if you got it done, much appreciated!