Using your code I couldn't achieve the acc you upload.

ehion commented 5 years ago

I trained Imagenet using 32 GPUs via horovod (8V100*4) but got acc 77.1% which was much less than 78.6% reported in your paper by running: python train.py -c confs/resnet50_imagenet_b4096.yaml --aug fa_reduced_imagenet --horovod Moreover,as your yaml config,lr type should be multistep(adjust_learning_rate_resnet) which can be seen in train.py,but i saw cosine lr decay adopted during my test via your code. Waiting for your reply ,thx.

ildoonet commented 5 years ago

Thanks for the report. This repository was made by copying parts of the original codes. In the process of doing it, some codes may be wrong, although I verified cifar10/100 results with some models. As you reported, resnet should follow multistep(adjust_learning_rate_resnet) schedule which should be fixed.

I will fix this and also reproduce the result with it. After that, I will update accordingly. Thanks.

ildoonet commented 5 years ago

If you have enough gpu resouces to try, please train with the codes of branch 'bug/lr-scheduler', where I commit a fix : https://github.com/KakaoBrain/fast-autoaugment/commit/834e65154a81b7d37a8b4a9ca95135a6d8922598 .

Due to the current situation of lack of computation resources, I will try to train after this weekends.

Thanks.

ehion commented 5 years ago

If you have enough gpu resouces to try, please train with the codes of branch 'bug/lr-scheduler', where I commit a fix : 834e651 .

Due to the current situation of lack of computation resources, I will try to train after this weekends.

Thanks.

I have changed cosine lr decay to multistep lr decay,hope to get a good result tomorrow and i'll uploade my test result here to fix the bug with you ,thanks for your quick reply^^.

ehion commented 5 years ago

I trained Imagenet using 32 GPUs via horovod (8V100*4) but got acc 77.1% which was much less than 78.6% reported in your paper by running: python train.py -c confs/resnet50_imagenet_b4096.yaml --aug fa_reduced_imagenet --horovod Moreover,as your yaml config,lr type should be multistep(adjust_learning_rate_resnet) which can be seen in train.py,but i saw cosine lr decay adopted during my test via your code. Waiting for your reply ,thx.

I got multistep lr decay acc:7664599999904632%,still much less than 78.6%

ildoonet commented 5 years ago

@ehion Let me verify the code and get back to you (hopefully, next week).

ehion commented 5 years ago

@ehion Let me verify the code and get back to you (hopefully, next week).

Waiting for it ,thanks

ildoonet commented 5 years ago

@ehion Thanks for your contribution. I experimented with the original code, while our team prepare to a paper for neurips 2019. I found some bugs and things to be fixed but the performance will be different with the current README. Top1 and Top5 error rate is 22.4 / 6.4 for now but I will come back to you after double-checking codes and experiments.

ildoonet commented 4 years ago

@JoinWei-PKU Our first reported value was 21.4% but not 22.4% (our final version of paper, neurips.) And we will release the final code for search and retrain. Before we do that, I'm checking all final retrained models performed as good as the paper claims. So within one or two weeks, I will update the code with search and retrain, as well as the checkpoints of the retrained models.

Thanks for your interests.