Open YihangLou opened 6 years ago
Thanks for your sharing code. Maybe there are many tricks in the original implementation. But the performance margin with the paper reported results are too large. Hope you can perfectly reproduce the results in the future!
ok,i try to pre-process the image and keep the training process same as the paper. and the current code does not do the padding ,crop, flip and so on, and i use the adam,(the paper is sgd), and i only trained 100 epochs(about 204 epochs in the paper).
@YihangLou , Hi, today I modified something, and get a new result: accuracy for cifar10 test set : 92.66%
@YihangLou , i modify the optimizer, so the newest result now is 0.9354
Hi @tengshaofeng, This result you got (0.9354) was using only the ResidualAttentionModel_92_32input network in the train.py file? Or do you first pretrain the network using the train_pre.py file and then train using the train.py file?
Can you provide a trained model
@josianerodrigues , use only the train.py, train_pre.py is just my back up for code.
@123moon , i have provide the model of the final epoch. its accuracy is 0.9332.
you provide the MODEL is 92-32,Do you have a model for the dataset imagenet 224???我可能用英语说不清,打扰你了,你有关于图像维数是224*224的训练模型吗??你的代码对我很有帮助,可是关于这个数据集,我没办法下载下来,所以请求你的帮助
@123moon , 有224*224的训练模型的呀, residual_attention_network.py文件中的ResidualAttentionModel_92类就是。 下载imagenet可以访问http://image-net.org/download,需要你自己注册一下
嗯呢,我看到了,我想问的是,有木有训练好的模型吖,我这个要跑起来要好久呢,我电脑内存不足,哎
没有这个耶,我电脑也没那么多存储放那么大的数据
Hi @tengshaofeng Could you tell me what the effect of resetting the learning rate at a particular epoch?
# Decaying Learning Rate
if (epoch+1) / float(total_epoch) == 0.3 or (epoch+1) / float(total_epoch) == 0.6 or (epoch+1) / float(total_epoch) == 0.9:
lr /= 10
print('reset learning rate to:', lr)
for param_group in optimizer.param_groups:
param_group['lr'] = lr
print(param_group['lr'])
@josianerodrigues , it is a trick for learning. when i decrease the learning rate, the loss decrease quickly. It means that when i use lr=0.1 to train 90 epochs, i found loss tending to converge, then i decrease the lr=0.01, the loss decrease again.
thanks for the explanation :)
Hi @tengshaofeng I also work on medical images.You mentioned this code worked well in your own implement about medical images recognition.I am in trouble when classify a medical image dataset.Could you tell me more details about it in your convenience?Or could you add my qq if possible?My qq number is 1922525328.
Thanks.
@estelle1722 , i use the 448input, it can convenience well.
Hi, I also could not reproduce the results of the paper (with my implementation in Tensorflow) on CIFAR-10 even after exchanging a few emails with the author.
@ondrejba , what is your best result now?
My best accuracy was 94.32%, which is close to 95.01% reported in the paper, but it does not beat ResNet-164 with less parameters.
@ondrejba , ok,your result are really better. have you read the ResidualAttentionModel_92_32input architecture in my code? If there are some difference with yours? or if you can share the code with me?
I'm sorry for the delay. I'll look at your code over the weekend.
@ondrejba thanks
I noticed many difference just from looking at residual_attention_network.py:
I bet there are more differences but I don't have time to go through the whole attention module. I hope this helps.
I'm actually surprised that you achieved such a good CIFAR accuracy with max pooling at the start of the network.
Hi @ondrejba, If possible could you make your code available? Did you get 94% accuracy on what dataset and with which network? ResNet-164? What do you use after the first convolution? Sorry for taking your time.
Hello, I got 94.32% accuracy with Attention92 on CIFAR-10. The 95.01% accuracy I mentioned is also for Attention92 evaluated on CIFAR-10; it was reported in the Residual Attention Networks paper but I didn't manage to replicate the results. I will look into open-sourcing my code.
After the first convolution ... there are all the other convolutions in the network followed by average pooling and a single fully-connected layer. This architecture is described in the Residual Attention Networks paper as well as the Identity Mappings paper that is a follow up to the Deep Residual Learning paper.
Cheers, Ondrej
Thank you :)
You're welcome! Let me know if you manage to reproduce Fei Wang's results.
@ondrejba , So late to reply you. You really give some useful ideas. Maybe I will train again according your advises. thanks so much.
@ondrejba , hi, could you tell me when to upsample? when feature map size is 88 or 44?
@YihangLou @josianerodrigues @123moon @estelle1722 @ondrejba hi evereybody. ok, my model now get the best accuracy of 0.954, the newest code is uploaded. @ondrejba thanks for your advises.
Thank you, @tengshaofeng :)
That's awesome! Can you try to run it more than once and average the accuracies?
@ondrejba , i will not try again for time limit. I found accuracy is above 0.951 for many epochs.
Ok, that's fine :+1:
Hi @tengshaofeng, Which network do you think is better for multilabel dataset like NUS-WIDE? I ran the ResidualAttentionModel_92, but the result was not very good.
Hi~我想用Gluon实现一下这个网络,参考下你的代码
@PistonY , 好的,加油
cifar-10跑出来了,你有试过imagenet吗?
@PistonY ,没有啊 ,imagenet数据太大了,你可以试试
Have a try mixup and MSRAPrelu-init on cifar10,I got better result.
@PistonY , congratulation, can you provide the detail code? what is the mixup and MSRAPrelu-init?
@PistonY , i have checked your project. Do you mean that the best result is based MSRAPrelu? mixup is still training?
Yes,I've just done training mixup, it's only got acc with 96.57,not good enough.You can check out this for more details.
@PistonY , acc with 96.57, it means the new best result? your result with MSRAPrelu is 95.41? improve more than one percent.
大哥,咱俩还是用中文说吧,我用MSRAPrelu达到了95.68,一般来说mixup至少有一个百分点的提升,具体可以参考我刚才给你的链接,那个是GluonCV官方train的.MSRAPrelu的提升不是很大,但是mixup会有惊喜.
用MSRAPrelu达到了95.68这个我理解, mixup 你训练完不是达到了96.57吗,这个就是新高啊,为啥还说not good enough
ok, maybe i will try to do some image pre-processing and tune the super parameters to achieve that. but this code performance well in my own implement about medical images recognition.