Open YihangLou opened 6 years ago
好的,看来mixup这个很实用,我去学习学习,谢了
我又train了一下,感觉mixup最高也就96.5X,如果你达到更高的结果可以告诉我.
@PistonY 我引入mixup跑了下,现在跑了40个循环,就目前来看没有不加mixup的效果好,acc差2到3个百分点,不知道训练完是什么情况。我想问一下你是用beta(alpha, alpha)分布的对吧。你的alpha取多少, 给定一个batch数据,你是不是mix后的数据迭代一次,然后原有的batch数据也迭代一次?
alpha = 1, 不是这样的,一直用mixup的数据跑,mixup要比正常训练多跑20个epoch,最后20个epoch用正常的数据.如果你完全按照我的实现mixup一定能跑到96.5的.我最稳定的结果是no-mixup:95.5, mixup:96.5.这是每次训练都能达到的.
@PistonY 好的,感谢指正,mix的损失函数也是这么定义的吧: lam criterion(pred, y_a) + (1 - lam) criterion(pred, y_b)?
是sum((lamy_a + (1 - lam)y_b) * pred) 其中y_a和y_b都是one-hot的形式.pred是log_softmax之后的.
one-hot好像不影响使用, https://github.com/facebookresearch/mixup-cifar10/issues/6, 只要保证criterion损失函数内部能处理非onehot数据就好。
恩,我只是把Gluon的实现写了一下.
你可以试试kaggle免费的K80啊,不知道好不好用,mxnet没cuda版,pytorch应该可以用的.
没看明白你说啥
@PistonY ,我跑出来mixup的结果是准确率96.65%
@PistonY ,什么叫标准,并没有四舍五入哦,就是这个结果
上一条发错了 Any way,good for you.
@PistonY , Any way, thanks very much. It is your contribution.
No,no,no.I totally refer to your project. I just have a try AttentionResNeXt on cifar-10.Model is a little bigger, result still can't reach 97%.(96.92 highest with mixup.) I'm wondering how I can reach that.
Ok, maybe I could try AttentionResNeXt. You are really the person giving valuable information.
@PistonY , hi, I can not find the paper AttentionResNeXt , Can you provide the paper name? 还是说你自己把Attention和ResNext结合,自己的一个尝试? 所以97%是自己给自己定的目标,并不是某篇论文里的最好结果?
就是这篇论文呢里面提到的,在ImageNet那里提到的,我估计就是把ResBlock换成了ResNeXt的Block,不过从论文里面看提升也是非常小的. 97%可以说是一个阈值吧,很多非常大的模型也仅仅能到达97%好一点点,所以我想能不能通过很小的模型达到这个结果.
@PistonY , 好吧,算了, 花更多的参数提升那么小不值得,在现实中也派不上啥用场,在实际的数据中,resNet50还比se_resNext50好一丢丢呢
我发现你的项目有对原论文的结构有修改?AttentionModule_pre这个模块是论文的默认结构吗?后面的实现请问参考的哪里?
@PistonY , 也不算是修改吧,因为他也没提供cifar的32大小输入的网络结构,只提供了224输入的,所以我也是在他论文的大体情况来定的,pre我都忘了具体有啥区别了,是我自己之前版本的,我把第一个Maxpooling去掉了,卷积he改成3
不对不对 是这样的,我看你用了两段加法在下采样和上采样的时候,类似与这样:
out_mpool1 = self.mpool1(x)
out_softmax1 = self.softmax1_blocks(out_mpool1)
out_skip1_connection = self.skip1_connection_residual_block(out_softmax1)
out_mpool2 = self.mpool2(out_softmax1)
out_softmax2 = self.softmax2_blocks(out_mpool2)
out_skip2_connection = self.skip2_connection_residual_block(out_softmax2)
out_mpool3 = self.mpool3(out_softmax2)
out_softmax3 = self.softmax3_blocks(out_mpool3)
out_interp3 = F.elemwise_add(self.interpolation3(out_softmax3), out_softmax2)
out = F.elemwise_add(out_interp3, out_skip2_connection)
原文里面有这个out = F.elemwise_add(out_interp3, out_skip2_connection)中的out_skip2_connection吗?
这个是原文有的
好的好的,知道了,谢谢!
我不是大佬,哎,那我加你好了
请问你有出现过这个问题嘛, TypeError: new() received an invalid combination of arguments - got (float, int, int, int), but expected one of:
@zhongleilz 请参照https://github.com/tengshaofeng/ResidualAttentionNetwork-pytorch/issues/3
Can anyone provide or refer me to trained models for CIFAR-10, CIFAR-100 or ImageNet-2017?
@tengshaofeng I saw the Attention-92 without mixup trained model, could you also upload it for the two results with mixup?
@simi2525 , you can train it yourself. Because trained models for githup is a little big. And when I train the model I have not saved the best model. Sorry.
If you have enough time please have a try imagenet without wd。I use wd with 1e-4 can't reach paper result。
@PistonY Did you use this implementation?
@PistonY Did you use this implementation?
Yes,and I do some simplification.But in Gluon not Pytorch.
@tengshaofeng for my current project, all I need are trained models, the one already uploaded is good enough. If I get the time to tinker with it in order to get the initial paper results, I'll be sure to let you know.
@simi2525 ,um, the uploaded trained model is actually better than the one in initial paper. The one in paper is accuracy of 95.01% , and the uploaded one is accuracy of 95.4%. Both is based on attention-92.
@simi2525 ,um, the uploaded trained model is actually better than the one in initial paper. The one in paper is accuracy of 95.01% , and the uploaded one is accuracy of 95.4%. Both is based on attention-92. May I ask what is the highest test accuracy of cifar10 in the papers you know to be employed at present?
@PistonY , hi, I can not find the paper AttentionResNeXt , Can you provide the paper name? 还是说你自己把Attention和ResNext结合,自己的一个尝试? 所以97%是自己给自己定的目标,并不是某篇论文里的最好结果?
感觉用了mixup这种 就是结果上了97%也没有啥意义啊 毕竟大家用这个都可以上去
@sankin1770 , 没用mixup,也有acc 95.4%,比原文中的高, 我这个项目只是复现论文的结果罢了。
@sankin1770 天真. @tengshaofeng 终于到97%了,太不容易了.
@sankin1770 天真. @tengshaofeng 终于到97%了,太不容易了.
好吧 接受你的批评 可我还是想不明白用mixup有什么创新 大家用了都能提升
@sankin1770 天真. @tengshaofeng 终于到97%了,太不容易了.
好吧 接受你的批评 可我还是想不明白用mixup有什么创新 大家用了都能提升
多跑跑就知道提升哪怕0.1的精度有多难.方法不在于创新,而在于有用.而且mixup算是大的创新了,但是使用局限性也高.
@sankin1770 天真. @tengshaofeng 终于到97%了,太不容易了.
好吧 接受你的批评 可我还是想不明白用mixup有什么创新 大家用了都能提升
多跑跑就知道提升哪怕0.1的精度有多难.方法不在于创新,而在于有用.而且mixup算是大的创新了,但是使用局限性也高.
是的 我初学 见谅
@PistonY , 你用了啥方法,提高到97%, 求指教
@tengshaofeng https://arxiv.org/pdf/1812.01187.pdf
@tengshaofeng https://arxiv.org/pdf/1812.01187.pdf
谢谢你们的帮助 我自己改进后也达到97%
@PistonY , u can always give me surprise. thanks.
@PistonY , u can always give me surprise. thanks.
你们两个大佬官方胡互吹 哈哈
@sankin1770 谢谢你的批判性建议
@sankin1770 你用pytorch复现了那篇论文里面的方法吗?都用了什么到的97?
@tengshaofeng @sankin1770 And welcome to have a look our new FaceRecognition project Gluon-Face
因为mixup的提高至少一个百分点以上,可以参考我给你的网页,但是相比与95.68%, 96.57%这个成绩感觉不是很好,我更新了项目,你可以在看下.