a pytorch code about Residual Attention Network.
It seems that the this code reproduced results can not achieve the results in the original paper ? #1

因为mixup的提高至少一个百分点以上,可以参考我给你的网页,但是相比与95.68%, 96.57%这个成绩感觉不是很好,我更新了项目,你可以在看下.

@PistonY 我引入mixup跑了下,现在跑了40个循环,就目前来看没有不加mixup的效果好,acc差2到3个百分点,不知道训练完是什么情况。我想问一下你是用beta(alpha, alpha)分布的对吧。你的alpha取多少, 给定一个batch数据,你是不是mix后的数据迭代一次,然后原有的batch数据也迭代一次?

alpha = 1, 不是这样的,一直用mixup的数据跑,mixup要比正常训练多跑20个epoch,最后20个epoch用正常的数据.如果你完全按照我的实现mixup一定能跑到96.5的.我最稳定的结果是no-mixup:95.5, mixup:96.5.这是每次训练都能达到的.

@PistonY 好的,感谢指正,mix的损失函数也是这么定义的吧: lam criterion(pred, y_a) + (1 - lam) criterion(pred, y_b)?

是sum((lamy_a + (1 - lam)y_b) * pred) 其中y_a和y_b都是one-hot的形式.pred是log_softmax之后的.

one-hot好像不影响使用, https://github.com/facebookresearch/mixup-cifar10/issues/6, 只要保证criterion损失函数内部能处理非onehot数据就好。

@PistonY ,我跑出来mixup的结果是准确率96.65%

@PistonY ,什么叫标准,并没有四舍五入哦,就是这个结果

上一条发错了 Any way,good for you.

@PistonY , Any way, thanks very much. It is your contribution.

No,no,no.I totally refer to your project. I just have a try AttentionResNeXt on cifar-10.Model is a little bigger, result still can't reach 97%.(96.92 highest with mixup.) I'm wondering how I can reach that.

Ok, maybe I could try AttentionResNeXt. You are really the person giving valuable information.

@PistonY , hi, I can not find the paper AttentionResNeXt , Can you provide the paper name? 还是说你自己把Attention和ResNext结合,自己的一个尝试? 所以97%是自己给自己定的目标,并不是某篇论文里的最好结果?

就是这篇论文呢里面提到的,在ImageNet那里提到的,我估计就是把ResBlock换成了ResNeXt的Block,不过从论文里面看提升也是非常小的. 97%可以说是一个阈值吧,很多非常大的模型也仅仅能到达97%好一点点,所以我想能不能通过很小的模型达到这个结果.

@PistonY , 好吧,算了, 花更多的参数提升那么小不值得,在现实中也派不上啥用场,在实际的数据中,resNet50还比se_resNext50好一丢丢呢

@PistonY , 也不算是修改吧,因为他也没提供cifar的32大小输入的网络结构,只提供了224输入的,所以我也是在他论文的大体情况来定的,pre我都忘了具体有啥区别了,是我自己之前版本的,我把第一个Maxpooling去掉了,卷积he改成3

不对不对 是这样的,我看你用了两段加法在下采样和上采样的时候,类似与这样:

        out_mpool1 = self.mpool1(x)
        out_softmax1 = self.softmax1_blocks(out_mpool1)
        out_skip1_connection = self.skip1_connection_residual_block(out_softmax1)

        out_mpool2 = self.mpool2(out_softmax1)
        out_softmax2 = self.softmax2_blocks(out_mpool2)
        out_skip2_connection = self.skip2_connection_residual_block(out_softmax2)

        out_mpool3 = self.mpool3(out_softmax2)
        out_softmax3 = self.softmax3_blocks(out_mpool3)

        out_interp3 = F.elemwise_add(self.interpolation3(out_softmax3), out_softmax2)
        out = F.elemwise_add(out_interp3, out_skip2_connection)

原文里面有这个out = F.elemwise_add(out_interp3, out_skip2_connection)中的out_skip2_connection吗?

请问你有出现过这个问题嘛, TypeError: new() received an invalid combination of arguments - got (float, int, int, int), but expected one of:

@zhongleilz 请参照https://github.com/tengshaofeng/ResidualAttentionNetwork-pytorch/issues/3

Can anyone provide or refer me to trained models for CIFAR-10, CIFAR-100 or ImageNet-2017?

@tengshaofeng I saw the Attention-92 without mixup trained model, could you also upload it for the two results with mixup?

@simi2525 , you can train it yourself. Because trained models for githup is a little big. And when I train the model I have not saved the best model. Sorry.

If you have enough time please have a try imagenet without wd。I use wd with 1e-4 can't reach paper result。

@PistonY Did you use this implementation?

@PistonY Did you use this implementation?

Yes,and I do some simplification.But in Gluon not Pytorch.

@tengshaofeng for my current project, all I need are trained models, the one already uploaded is good enough. If I get the time to tinker with it in order to get the initial paper results, I'll be sure to let you know.

@simi2525 ,um, the uploaded trained model is actually better than the one in initial paper. The one in paper is accuracy of 95.01% , and the uploaded one is accuracy of 95.4%. Both is based on attention-92.

@simi2525 ,um, the uploaded trained model is actually better than the one in initial paper. The one in paper is accuracy of 95.01% , and the uploaded one is accuracy of 95.4%. Both is based on attention-92. May I ask what is the highest test accuracy of cifar10 in the papers you know to be employed at present?

@PistonY , hi, I can not find the paper AttentionResNeXt , Can you provide the paper name? 还是说你自己把Attention和ResNext结合,自己的一个尝试? 所以97%是自己给自己定的目标,并不是某篇论文里的最好结果?

感觉用了mixup这种 就是结果上了97%也没有啥意义啊 毕竟大家用这个都可以上去

@sankin1770 , 没用mixup,也有acc 95.4%,比原文中的高, 我这个项目只是复现论文的结果罢了。

@sankin1770 天真. @tengshaofeng 终于到97%了,太不容易了.

@sankin1770 天真. @tengshaofeng 终于到97%了,太不容易了.

好吧 接受你的批评 可我还是想不明白用mixup有什么创新 大家用了都能提升

@sankin1770 天真. @tengshaofeng 终于到97%了,太不容易了.

好吧 接受你的批评 可我还是想不明白用mixup有什么创新 大家用了都能提升


@sankin1770 天真. @tengshaofeng 终于到97%了,太不容易了.

好吧 接受你的批评 可我还是想不明白用mixup有什么创新 大家用了都能提升


是的 我初学 见谅

@PistonY , 你用了啥方法,提高到97%, 求指教

@tengshaofeng https://arxiv.org/pdf/1812.01187.pdf

@tengshaofeng https://arxiv.org/pdf/1812.01187.pdf

谢谢你们的帮助 我自己改进后也达到97%

@PistonY , u can always give me surprise. thanks.

@PistonY , u can always give me surprise. thanks.

你们两个大佬官方胡互吹 哈哈

@sankin1770 谢谢你的批判性建议

@sankin1770 你用pytorch复现了那篇论文里面的方法吗?都用了什么到的97?

And welcome to have a look our new FaceRecognition project Gluon-Face