It seems that the this code reproduced results can not achieve the results in the original paper ?

tengshaofeng commented 6 years ago

ok, maybe i will try to do some image pre-processing and tune the super parameters to achieve that. but this code performance well in my own implement about medical images recognition.

YihangLou commented 6 years ago

Thanks for your sharing code. Maybe there are many tricks in the original implementation. But the performance margin with the paper reported results are too large. Hope you can perfectly reproduce the results in the future!

tengshaofeng commented 6 years ago

ok，i try to pre-process the image and keep the training process same as the paper. and the current code does not do the padding ,crop, flip and so on, and i use the adam,(the paper is sgd), and i only trained 100 epochs(about 204 epochs in the paper).

tengshaofeng commented 6 years ago

@YihangLou , Hi, today I modified something, and get a new result: accuracy for cifar10 test set : 92.66%

tengshaofeng commented 6 years ago

@YihangLou ， i modify the optimizer, so the newest result now is 0.9354

josianerodrigues commented 6 years ago

Hi @tengshaofeng, This result you got (0.9354) was using only the ResidualAttentionModel_92_32input network in the train.py file? Or do you first pretrain the network using the train_pre.py file and then train using the train.py file?

123moon commented 6 years ago

Can you provide a trained model

tengshaofeng commented 6 years ago

@josianerodrigues , use only the train.py, train_pre.py is just my back up for code.

tengshaofeng commented 6 years ago

@123moon , i have provide the model of the final epoch. its accuracy is 0.9332.

123moon commented 6 years ago

you provide the MODEL is 92-32,Do you have a model for the dataset imagenet 224???我可能用英语说不清，打扰你了，你有关于图像维数是224*224的训练模型吗？？你的代码对我很有帮助，可是关于这个数据集，我没办法下载下来，所以请求你的帮助

tengshaofeng commented 6 years ago

@123moon , 有224*224的训练模型的呀， residual_attention_network.py文件中的ResidualAttentionModel_92类就是。下载imagenet可以访问http://image-net.org/download，需要你自己注册一下

123moon commented 6 years ago

嗯呢，我看到了，我想问的是，有木有训练好的模型吖，我这个要跑起来要好久呢，我电脑内存不足，哎

tengshaofeng commented 6 years ago

没有这个耶，我电脑也没那么多存储放那么大的数据

josianerodrigues commented 6 years ago

Hi @tengshaofeng Could you tell me what the effect of resetting the learning rate at a particular epoch?

# Decaying Learning Rate
if (epoch+1) / float(total_epoch) == 0.3 or (epoch+1) / float(total_epoch) == 0.6 or (epoch+1) / float(total_epoch) == 0.9:
        lr /= 10
        print('reset learning rate to:', lr)
        for param_group in optimizer.param_groups:
             param_group['lr'] = lr
             print(param_group['lr'])

tengshaofeng commented 6 years ago

@josianerodrigues ， it is a trick for learning. when i decrease the learning rate, the loss decrease quickly. It means that when i use lr=0.1 to train 90 epochs, i found loss tending to converge, then i decrease the lr=0.01, the loss decrease again.

josianerodrigues commented 6 years ago

thanks for the explanation :)

zhangrong1722 commented 6 years ago

Hi @tengshaofeng I also work on medical images.You mentioned this code worked well in your own implement about medical images recognition.I am in trouble when classify a medical image dataset.Could you tell me more details about it in your convenience?Or could you add my qq if possible?My qq number is 1922525328.

Thanks.

tengshaofeng commented 6 years ago

@estelle1722 , i use the 448input, it can convenience well.

ondrejbiza commented 6 years ago

Hi, I also could not reproduce the results of the paper (with my implementation in Tensorflow) on CIFAR-10 even after exchanging a few emails with the author.

tengshaofeng commented 6 years ago

@ondrejba , what is your best result now?

ondrejbiza commented 6 years ago

My best accuracy was 94.32%, which is close to 95.01% reported in the paper, but it does not beat ResNet-164 with less parameters.

tengshaofeng commented 6 years ago

@ondrejba ， ok，your result are really better. have you read the ResidualAttentionModel_92_32input architecture in my code？ If there are some difference with yours? or if you can share the code with me?

ondrejbiza commented 6 years ago

I'm sorry for the delay. I'll look at your code over the weekend.

tengshaofeng commented 6 years ago

@ondrejba thanks

ondrejbiza commented 6 years ago

I noticed many difference just from looking at residual_attention_network.py:

I use filter size 3 in the first convolution, you use 5 (probably not important)
I don't use max pooling (downsampling 32x32 images after the first convolution is not a good idea)
my filter counts for the three scales are [64, 128, 256] whereas you have [256, 512, 1024] filters

I bet there are more differences but I don't have time to go through the whole attention module. I hope this helps.

ondrejbiza commented 6 years ago

I'm actually surprised that you achieved such a good CIFAR accuracy with max pooling at the start of the network.

josianerodrigues commented 6 years ago

Hi @ondrejba, If possible could you make your code available? Did you get 94% accuracy on what dataset and with which network? ResNet-164? What do you use after the first convolution? Sorry for taking your time.

ondrejbiza commented 6 years ago

Hello, I got 94.32% accuracy with Attention92 on CIFAR-10. The 95.01% accuracy I mentioned is also for Attention92 evaluated on CIFAR-10; it was reported in the Residual Attention Networks paper but I didn't manage to replicate the results. I will look into open-sourcing my code.

After the first convolution ... there are all the other convolutions in the network followed by average pooling and a single fully-connected layer. This architecture is described in the Residual Attention Networks paper as well as the Identity Mappings paper that is a follow up to the Deep Residual Learning paper.

Cheers, Ondrej

josianerodrigues commented 6 years ago

Thank you :)

ondrejbiza commented 6 years ago

You're welcome! Let me know if you manage to reproduce Fei Wang's results.

tengshaofeng commented 6 years ago

@ondrejba , So late to reply you. You really give some useful ideas. Maybe I will train again according your advises. thanks so much.

tengshaofeng commented 6 years ago

@ondrejba , hi, could you tell me when to upsample? when feature map size is 88 or 44?

tengshaofeng commented 6 years ago

@YihangLou @josianerodrigues @123moon @estelle1722 @ondrejba hi evereybody. ok, my model now get the best accuracy of 0.954, the newest code is uploaded. @ondrejba thanks for your advises.

josianerodrigues commented 6 years ago

Thank you, @tengshaofeng :)

ondrejbiza commented 6 years ago

That's awesome! Can you try to run it more than once and average the accuracies?

tengshaofeng commented 6 years ago

@ondrejba , i will not try again for time limit. I found accuracy is above 0.951 for many epochs.

ondrejbiza commented 6 years ago

Ok, that's fine :+1:

josianerodrigues commented 6 years ago

Hi @tengshaofeng, Which network do you think is better for multilabel dataset like NUS-WIDE? I ran the ResidualAttentionModel_92, but the result was not very good.

PistonY commented 5 years ago

Hi~我想用Gluon实现一下这个网络,参考下你的代码

tengshaofeng commented 5 years ago

@PistonY , 好的，加油

PistonY commented 5 years ago

cifar-10跑出来了,你有试过imagenet吗?

tengshaofeng commented 5 years ago

@PistonY ，没有啊，imagenet数据太大了，你可以试试

PistonY commented 5 years ago

Have a try mixup and MSRAPrelu-init on cifar10,I got better result.

tengshaofeng commented 5 years ago

@PistonY , congratulation， can you provide the detail code? what is the mixup and MSRAPrelu-init?

PistonY commented 5 years ago

Mixup is this paper, and MSRAPrelu is an init type like Xavier which Gluon has been implement.

For more details you can check out my project.

tengshaofeng commented 5 years ago

@PistonY ， i have checked your project. Do you mean that the best result is based MSRAPrelu? mixup is still training?

PistonY commented 5 years ago

Yes,I've just done training mixup, it's only got acc with 96.57,not good enough.You can check out this for more details.

tengshaofeng commented 5 years ago

@PistonY , acc with 96.57, it means the new best result? your result with MSRAPrelu is 95.41? improve more than one percent.

PistonY commented 5 years ago

大哥,咱俩还是用中文说吧,我用MSRAPrelu达到了95.68,一般来说mixup至少有一个百分点的提升,具体可以参考我刚才给你的链接,那个是GluonCV官方train的.MSRAPrelu的提升不是很大,但是mixup会有惊喜.

tengshaofeng commented 5 years ago

用MSRAPrelu达到了95.68这个我理解， mixup 你训练完不是达到了96.57吗，这个就是新高啊，为啥还说not good enough

tengshaofeng / ResidualAttentionNetwork-pytorch

It seems that the this code reproduced results can not achieve the results in the original paper ? #1