tengshaofeng / ResidualAttentionNetwork-pytorch

a pytorch code about Residual Attention Network. This code is based on two projects from
672 stars 165 forks source link

It seems that the this code reproduced results can not achieve the results in the original paper ? #1

Open YihangLou opened 6 years ago

PistonY commented 5 years ago

因为mixup的提高至少一个百分点以上,可以参考我给你的网页,但是相比与95.68%, 96.57%这个成绩感觉不是很好,我更新了项目,你可以在看下.

tengshaofeng commented 5 years ago

好的,看来mixup这个很实用,我去学习学习,谢了

PistonY commented 5 years ago

我又train了一下,感觉mixup最高也就96.5X,如果你达到更高的结果可以告诉我.

tengshaofeng commented 5 years ago

@PistonY 我引入mixup跑了下,现在跑了40个循环,就目前来看没有不加mixup的效果好,acc差2到3个百分点,不知道训练完是什么情况。我想问一下你是用beta(alpha, alpha)分布的对吧。你的alpha取多少, 给定一个batch数据,你是不是mix后的数据迭代一次,然后原有的batch数据也迭代一次?

PistonY commented 5 years ago

alpha = 1, 不是这样的,一直用mixup的数据跑,mixup要比正常训练多跑20个epoch,最后20个epoch用正常的数据.如果你完全按照我的实现mixup一定能跑到96.5的.我最稳定的结果是no-mixup:95.5, mixup:96.5.这是每次训练都能达到的.

tengshaofeng commented 5 years ago

@PistonY 好的,感谢指正,mix的损失函数也是这么定义的吧: lam criterion(pred, y_a) + (1 - lam) criterion(pred, y_b)?

PistonY commented 5 years ago

是sum((lamy_a + (1 - lam)y_b) * pred) 其中y_a和y_b都是one-hot的形式.pred是log_softmax之后的.

tengshaofeng commented 5 years ago

one-hot好像不影响使用, https://github.com/facebookresearch/mixup-cifar10/issues/6, 只要保证criterion损失函数内部能处理非onehot数据就好。

PistonY commented 5 years ago

恩,我只是把Gluon的实现写了一下.

PistonY commented 5 years ago

你可以试试kaggle免费的K80啊,不知道好不好用,mxnet没cuda版,pytorch应该可以用的.

tengshaofeng commented 5 years ago

没看明白你说啥

tengshaofeng commented 5 years ago

@PistonY ,我跑出来mixup的结果是准确率96.65%

tengshaofeng commented 5 years ago

@PistonY ,什么叫标准,并没有四舍五入哦,就是这个结果

PistonY commented 5 years ago

上一条发错了 Any way,good for you.

tengshaofeng commented 5 years ago

@PistonY , Any way, thanks very much. It is your contribution.

PistonY commented 5 years ago

No,no,no.I totally refer to your project. I just have a try AttentionResNeXt on cifar-10.Model is a little bigger, result still can't reach 97%.(96.92 highest with mixup.) I'm wondering how I can reach that.

tengshaofeng commented 5 years ago

Ok, maybe I could try AttentionResNeXt. You are really the person giving valuable information.

tengshaofeng commented 5 years ago

@PistonY , hi, I can not find the paper AttentionResNeXt , Can you provide the paper name? 还是说你自己把Attention和ResNext结合,自己的一个尝试? 所以97%是自己给自己定的目标,并不是某篇论文里的最好结果?

PistonY commented 5 years ago

就是这篇论文呢里面提到的,在ImageNet那里提到的,我估计就是把ResBlock换成了ResNeXt的Block,不过从论文里面看提升也是非常小的. 97%可以说是一个阈值吧,很多非常大的模型也仅仅能到达97%好一点点,所以我想能不能通过很小的模型达到这个结果.

tengshaofeng commented 5 years ago

@PistonY , 好吧,算了, 花更多的参数提升那么小不值得,在现实中也派不上啥用场,在实际的数据中,resNet50还比se_resNext50好一丢丢呢

PistonY commented 5 years ago

我发现你的项目有对原论文的结构有修改?AttentionModule_pre这个模块是论文的默认结构吗?后面的实现请问参考的哪里?

tengshaofeng commented 5 years ago

@PistonY , 也不算是修改吧,因为他也没提供cifar的32大小输入的网络结构,只提供了224输入的,所以我也是在他论文的大体情况来定的,pre我都忘了具体有啥区别了,是我自己之前版本的,我把第一个Maxpooling去掉了,卷积he改成3

PistonY commented 5 years ago

不对不对 是这样的,我看你用了两段加法在下采样和上采样的时候,类似与这样:

        out_mpool1 = self.mpool1(x)
        out_softmax1 = self.softmax1_blocks(out_mpool1)
        out_skip1_connection = self.skip1_connection_residual_block(out_softmax1)

        out_mpool2 = self.mpool2(out_softmax1)
        out_softmax2 = self.softmax2_blocks(out_mpool2)
        out_skip2_connection = self.skip2_connection_residual_block(out_softmax2)

        out_mpool3 = self.mpool3(out_softmax2)
        out_softmax3 = self.softmax3_blocks(out_mpool3)

        out_interp3 = F.elemwise_add(self.interpolation3(out_softmax3), out_softmax2)
        out = F.elemwise_add(out_interp3, out_skip2_connection)

原文里面有这个out = F.elemwise_add(out_interp3, out_skip2_connection)中的out_skip2_connection吗?

tengshaofeng commented 5 years ago

这个是原文有的

PistonY commented 5 years ago

好的好的,知道了,谢谢!

tengshaofeng commented 5 years ago

我不是大佬,哎,那我加你好了

zhongleilz commented 5 years ago

请问你有出现过这个问题嘛, TypeError: new() received an invalid combination of arguments - got (float, int, int, int), but expected one of:

tengshaofeng commented 5 years ago

@zhongleilz 请参照https://github.com/tengshaofeng/ResidualAttentionNetwork-pytorch/issues/3

simi2525 commented 5 years ago

Can anyone provide or refer me to trained models for CIFAR-10, CIFAR-100 or ImageNet-2017?

@tengshaofeng I saw the Attention-92 without mixup trained model, could you also upload it for the two results with mixup?

tengshaofeng commented 5 years ago

@simi2525 , you can train it yourself. Because trained models for githup is a little big. And when I train the model I have not saved the best model. Sorry.

PistonY commented 5 years ago

If you have enough time please have a try imagenet without wd。I use wd with 1e-4 can't reach paper result。

ondrejbiza commented 5 years ago

@PistonY Did you use this implementation?

PistonY commented 5 years ago

@PistonY Did you use this implementation?

Yes,and I do some simplification.But in Gluon not Pytorch.

simi2525 commented 5 years ago

@tengshaofeng for my current project, all I need are trained models, the one already uploaded is good enough. If I get the time to tinker with it in order to get the initial paper results, I'll be sure to let you know.

tengshaofeng commented 5 years ago

@simi2525 ,um, the uploaded trained model is actually better than the one in initial paper. The one in paper is accuracy of 95.01% , and the uploaded one is accuracy of 95.4%. Both is based on attention-92.

sankin1770 commented 5 years ago

@simi2525 ,um, the uploaded trained model is actually better than the one in initial paper. The one in paper is accuracy of 95.01% , and the uploaded one is accuracy of 95.4%. Both is based on attention-92. May I ask what is the highest test accuracy of cifar10 in the papers you know to be employed at present?

sankin1770 commented 5 years ago

@PistonY , hi, I can not find the paper AttentionResNeXt , Can you provide the paper name? 还是说你自己把Attention和ResNext结合,自己的一个尝试? 所以97%是自己给自己定的目标,并不是某篇论文里的最好结果?

感觉用了mixup这种 就是结果上了97%也没有啥意义啊 毕竟大家用这个都可以上去

tengshaofeng commented 5 years ago

@sankin1770 , 没用mixup,也有acc 95.4%,比原文中的高, 我这个项目只是复现论文的结果罢了。

PistonY commented 5 years ago

@sankin1770 天真. @tengshaofeng 终于到97%了,太不容易了.

sankin1770 commented 5 years ago

@sankin1770 天真. @tengshaofeng 终于到97%了,太不容易了.

好吧 接受你的批评 可我还是想不明白用mixup有什么创新 大家用了都能提升

PistonY commented 5 years ago

@sankin1770 天真. @tengshaofeng 终于到97%了,太不容易了.

好吧 接受你的批评 可我还是想不明白用mixup有什么创新 大家用了都能提升

多跑跑就知道提升哪怕0.1的精度有多难.方法不在于创新,而在于有用.而且mixup算是大的创新了,但是使用局限性也高.

sankin1770 commented 5 years ago

@sankin1770 天真. @tengshaofeng 终于到97%了,太不容易了.

好吧 接受你的批评 可我还是想不明白用mixup有什么创新 大家用了都能提升

多跑跑就知道提升哪怕0.1的精度有多难.方法不在于创新,而在于有用.而且mixup算是大的创新了,但是使用局限性也高.

是的 我初学 见谅

tengshaofeng commented 5 years ago

@PistonY , 你用了啥方法,提高到97%, 求指教

PistonY commented 5 years ago

@tengshaofeng https://arxiv.org/pdf/1812.01187.pdf

sankin1770 commented 5 years ago

@tengshaofeng https://arxiv.org/pdf/1812.01187.pdf

谢谢你们的帮助 我自己改进后也达到97%

tengshaofeng commented 5 years ago

@PistonY , u can always give me surprise. thanks.

sankin1770 commented 5 years ago

@PistonY , u can always give me surprise. thanks.

你们两个大佬官方胡互吹 哈哈

tengshaofeng commented 5 years ago

@sankin1770 谢谢你的批判性建议

PistonY commented 5 years ago

@sankin1770 你用pytorch复现了那篇论文里面的方法吗?都用了什么到的97?

PistonY commented 5 years ago

@tengshaofeng @sankin1770 And welcome to have a look our new FaceRecognition project Gluon-Face