tengshaofeng / ResidualAttentionNetwork-pytorch

a pytorch code about Residual Attention Network. This code is based on two projects from
681 stars 166 forks source link

Test Accuracy Stagnates #4

Closed jain-avi closed 6 years ago

jain-avi commented 6 years ago

Can you tell me if your training and testing accuracies always followed each other? I am implementing a smaller and modified version of the network you coded, and my test accuracy seems to have stagnated at 81%. Also, I think you have coded a different architecture because you are adding output of pool layer as well as the output of pool+conv layer to the upsampled input, while the actual architecture only adds the pool+conv output to the upsampled layer. Is that making all the difference?

tengshaofeng commented 6 years ago

@Neo96Mav , which network you used, or you have modified the network yourself based my code?

jain-avi commented 6 years ago

I have used your network and the official Caffe network for reference, and implemented my own small network. I am not using attention modules for 4x4 because I feel they are too small, and I am only using one attention module in 8x8. My network is relatively small, and its for CIFAR images only. Can you let me know the intuition behind this -

screen shot 2018-06-15 at 5 03 51 pm

You have added output of residual block, as well as the output of the skip connection to the upsampled layer!

tengshaofeng commented 6 years ago

@Neo96Mav , this is refer to the caffe network, i think it is added for more detail information. You can remove it for testing the effectiveness.

josianerodrigues commented 6 years ago

Hi @Neo96Mav, Did you test the model using only one 8x8 Attention module? Was the accuracy better?

jain-avi commented 6 years ago

Hi @josianerodrigues , I added the 4x4 attention module as well. I am stuck at 89.5% accuracy. Maybe my model is not big enough or I am not using the exact same configuration, but I feel that it should not have affected it so much. @tengshaofeng Do u have any ideas why we can't match the authors performance?

tengshaofeng commented 6 years ago

@Neo96Mav , the paper only give the archietcture details of attention_92 for imagenet with 224 input but not for cifar10. So I build the net ResidualAttentionModel_92_32input following my understanding. I have tested on it on cifar10 test set, the result is as following: Accuracy of the model on the test images: 0.9354

maybe some details is not good. you can refer to the data preprocessed in the paper, keep same with the author. or maybe you can tune the hyper parameters for better performance. U can also remove the add operation to test the network. image

tengshaofeng commented 6 years ago

@Neo96Mav @josianerodrigues the result now is 0.954