It seems that the this code reproduced results can not achieve the results in the original paper ?

PistonY commented 5 years ago

因为mixup的提高至少一个百分点以上,可以参考我给你的网页,但是相比与95.68%, 96.57%这个成绩感觉不是很好,我更新了项目,你可以在看下.

tengshaofeng commented 5 years ago

好的，看来mixup这个很实用，我去学习学习，谢了

PistonY commented 5 years ago

我又train了一下,感觉mixup最高也就96.5X,如果你达到更高的结果可以告诉我.

tengshaofeng commented 5 years ago

@PistonY 我引入mixup跑了下，现在跑了40个循环，就目前来看没有不加mixup的效果好，acc差2到3个百分点，不知道训练完是什么情况。我想问一下你是用beta（alpha， alpha）分布的对吧。你的alpha取多少，给定一个batch数据，你是不是mix后的数据迭代一次，然后原有的batch数据也迭代一次？

PistonY commented 5 years ago

alpha = 1, 不是这样的,一直用mixup的数据跑,mixup要比正常训练多跑20个epoch,最后20个epoch用正常的数据.如果你完全按照我的实现mixup一定能跑到96.5的.我最稳定的结果是no-mixup:95.5, mixup:96.5.这是每次训练都能达到的.

tengshaofeng commented 5 years ago

@PistonY 好的，感谢指正，mix的损失函数也是这么定义的吧： lam criterion(pred, y_a) + (1 - lam) criterion(pred, y_b)？

PistonY commented 5 years ago

是sum((lamy_a + (1 - lam)y_b) * pred) 其中y_a和y_b都是one-hot的形式.pred是log_softmax之后的.

tengshaofeng commented 5 years ago

one-hot好像不影响使用， https://github.com/facebookresearch/mixup-cifar10/issues/6，只要保证criterion损失函数内部能处理非onehot数据就好。

PistonY commented 5 years ago

恩,我只是把Gluon的实现写了一下.

PistonY commented 5 years ago

你可以试试kaggle免费的K80啊,不知道好不好用,mxnet没cuda版,pytorch应该可以用的.

tengshaofeng commented 5 years ago

没看明白你说啥

tengshaofeng commented 5 years ago

@PistonY ,我跑出来mixup的结果是准确率96.65%

tengshaofeng commented 5 years ago

@PistonY ,什么叫标准，并没有四舍五入哦，就是这个结果

PistonY commented 5 years ago

上一条发错了 Any way,good for you.

tengshaofeng commented 5 years ago

@PistonY , Any way, thanks very much. It is your contribution.

PistonY commented 5 years ago

No,no,no.I totally refer to your project. I just have a try AttentionResNeXt on cifar-10.Model is a little bigger, result still can't reach 97%.(96.92 highest with mixup.) I'm wondering how I can reach that.

tengshaofeng commented 5 years ago

Ok, maybe I could try AttentionResNeXt. You are really the person giving valuable information.

tengshaofeng commented 5 years ago

@PistonY , hi, I can not find the paper AttentionResNeXt , Can you provide the paper name? 还是说你自己把Attention和ResNext结合，自己的一个尝试？所以97%是自己给自己定的目标，并不是某篇论文里的最好结果？

PistonY commented 5 years ago

就是这篇论文呢里面提到的,在ImageNet那里提到的,我估计就是把ResBlock换成了ResNeXt的Block,不过从论文里面看提升也是非常小的. 97%可以说是一个阈值吧,很多非常大的模型也仅仅能到达97%好一点点,所以我想能不能通过很小的模型达到这个结果.

tengshaofeng commented 5 years ago

@PistonY ，好吧，算了，花更多的参数提升那么小不值得，在现实中也派不上啥用场，在实际的数据中，resNet50还比se_resNext50好一丢丢呢

PistonY commented 5 years ago

我发现你的项目有对原论文的结构有修改?AttentionModule_pre这个模块是论文的默认结构吗?后面的实现请问参考的哪里?

tengshaofeng commented 5 years ago

@PistonY , 也不算是修改吧，因为他也没提供cifar的32大小输入的网络结构，只提供了224输入的，所以我也是在他论文的大体情况来定的，pre我都忘了具体有啥区别了，是我自己之前版本的，我把第一个Maxpooling去掉了，卷积he改成3

PistonY commented 5 years ago

不对不对是这样的,我看你用了两段加法在下采样和上采样的时候,类似与这样:

        out_mpool1 = self.mpool1(x)
        out_softmax1 = self.softmax1_blocks(out_mpool1)
        out_skip1_connection = self.skip1_connection_residual_block(out_softmax1)

        out_mpool2 = self.mpool2(out_softmax1)
        out_softmax2 = self.softmax2_blocks(out_mpool2)
        out_skip2_connection = self.skip2_connection_residual_block(out_softmax2)

        out_mpool3 = self.mpool3(out_softmax2)
        out_softmax3 = self.softmax3_blocks(out_mpool3)

        out_interp3 = F.elemwise_add(self.interpolation3(out_softmax3), out_softmax2)
        out = F.elemwise_add(out_interp3, out_skip2_connection)

原文里面有这个out = F.elemwise_add(out_interp3, out_skip2_connection)中的out_skip2_connection吗?

tengshaofeng commented 5 years ago

这个是原文有的

PistonY commented 5 years ago

好的好的,知道了,谢谢!

tengshaofeng commented 5 years ago

我不是大佬，哎，那我加你好了

zhongleilz commented 5 years ago

请问你有出现过这个问题嘛， TypeError: new() received an invalid combination of arguments - got (float, int, int, int), but expected one of:

(torch.device device)
(torch.Storage storage)
(Tensor other)
(tuple of ints size, torch.device device)
(object data, torch.device device)

tengshaofeng commented 5 years ago

@zhongleilz 请参照https://github.com/tengshaofeng/ResidualAttentionNetwork-pytorch/issues/3

simi2525 commented 5 years ago

Can anyone provide or refer me to trained models for CIFAR-10, CIFAR-100 or ImageNet-2017?

@tengshaofeng I saw the Attention-92 without mixup trained model, could you also upload it for the two results with mixup?

tengshaofeng commented 5 years ago

@simi2525 , you can train it yourself. Because trained models for githup is a little big. And when I train the model I have not saved the best model. Sorry.

PistonY commented 5 years ago

If you have enough time please have a try imagenet without wd。I use wd with 1e-4 can't reach paper result。

ondrejbiza commented 5 years ago

@PistonY Did you use this implementation?

PistonY commented 5 years ago

@PistonY Did you use this implementation?

Yes,and I do some simplification.But in Gluon not Pytorch.

simi2525 commented 5 years ago

@tengshaofeng for my current project, all I need are trained models, the one already uploaded is good enough. If I get the time to tinker with it in order to get the initial paper results, I'll be sure to let you know.

tengshaofeng commented 5 years ago

@simi2525 ,um, the uploaded trained model is actually better than the one in initial paper. The one in paper is accuracy of 95.01% , and the uploaded one is accuracy of 95.4%. Both is based on attention-92.

sankin1770 commented 5 years ago

@simi2525 ,um, the uploaded trained model is actually better than the one in initial paper. The one in paper is accuracy of 95.01% , and the uploaded one is accuracy of 95.4%. Both is based on attention-92. May I ask what is the highest test accuracy of cifar10 in the papers you know to be employed at present？

sankin1770 commented 5 years ago

@PistonY , hi, I can not find the paper AttentionResNeXt , Can you provide the paper name? 还是说你自己把Attention和ResNext结合，自己的一个尝试？所以97%是自己给自己定的目标，并不是某篇论文里的最好结果？

感觉用了mixup这种就是结果上了97%也没有啥意义啊毕竟大家用这个都可以上去

tengshaofeng commented 5 years ago

@sankin1770 , 没用mixup，也有acc 95.4%，比原文中的高，我这个项目只是复现论文的结果罢了。

PistonY commented 5 years ago

@sankin1770 天真． @tengshaofeng 终于到97%了，太不容易了．

sankin1770 commented 5 years ago

@sankin1770 天真． @tengshaofeng 终于到97%了，太不容易了．

好吧接受你的批评可我还是想不明白用mixup有什么创新大家用了都能提升

PistonY commented 5 years ago

@sankin1770 天真． @tengshaofeng 终于到97%了，太不容易了．

好吧接受你的批评可我还是想不明白用mixup有什么创新大家用了都能提升

多跑跑就知道提升哪怕0.1的精度有多难.方法不在于创新,而在于有用.而且mixup算是大的创新了,但是使用局限性也高．

sankin1770 commented 5 years ago

@sankin1770 天真． @tengshaofeng 终于到97%了，太不容易了．

好吧接受你的批评可我还是想不明白用mixup有什么创新大家用了都能提升

多跑跑就知道提升哪怕0.1的精度有多难.方法不在于创新,而在于有用.而且mixup算是大的创新了,但是使用局限性也高．

是的我初学见谅

tengshaofeng commented 5 years ago

@PistonY , 你用了啥方法，提高到97%，求指教

PistonY commented 5 years ago

@tengshaofeng https://arxiv.org/pdf/1812.01187.pdf

sankin1770 commented 5 years ago

@tengshaofeng https://arxiv.org/pdf/1812.01187.pdf

谢谢你们的帮助我自己改进后也达到97%

tengshaofeng commented 5 years ago

@PistonY , u can always give me surprise. thanks.

sankin1770 commented 5 years ago

@PistonY , u can always give me surprise. thanks.

你们两个大佬官方胡互吹哈哈

tengshaofeng commented 5 years ago

@sankin1770 谢谢你的批判性建议

PistonY commented 5 years ago

@sankin1770 你用pytorch复现了那篇论文里面的方法吗?都用了什么到的97?

PistonY commented 5 years ago

@tengshaofeng @sankin1770 And welcome to have a look our new FaceRecognition project Gluon-Face

tengshaofeng / ResidualAttentionNetwork-pytorch

It seems that the this code reproduced results can not achieve the results in the original paper ? #1