kakaobrain / fast-autoaugment

Official Implementation of 'Fast AutoAugment' in PyTorch.
MIT License
1.59k stars 195 forks source link

Using your code I couldn't achieve the acc you upload. #5

Open ehion opened 5 years ago

ehion commented 5 years ago

I trained Imagenet using 32 GPUs via horovod (8V100*4) but got acc 77.1% which was much less than 78.6% reported in your paper by running: python train.py -c confs/resnet50_imagenet_b4096.yaml --aug fa_reduced_imagenet --horovod Moreover,as your yaml config,lr type should be multistep(adjust_learning_rate_resnet) which can be seen in train.py,but i saw cosine lr decay adopted during my test via your code. Waiting for your reply ,thx.

ildoonet commented 5 years ago

Thanks for the report. This repository was made by copying parts of the original codes. In the process of doing it, some codes may be wrong, although I verified cifar10/100 results with some models. As you reported, resnet should follow multistep(adjust_learning_rate_resnet) schedule which should be fixed.

I will fix this and also reproduce the result with it. After that, I will update accordingly. Thanks.

ildoonet commented 5 years ago

If you have enough gpu resouces to try, please train with the codes of branch 'bug/lr-scheduler', where I commit a fix : https://github.com/KakaoBrain/fast-autoaugment/commit/834e65154a81b7d37a8b4a9ca95135a6d8922598 .

Due to the current situation of lack of computation resources, I will try to train after this weekends.

Thanks.

ehion commented 5 years ago

If you have enough gpu resouces to try, please train with the codes of branch 'bug/lr-scheduler', where I commit a fix : 834e651 .

Due to the current situation of lack of computation resources, I will try to train after this weekends.

Thanks.

I have changed cosine lr decay to multistep lr decay,hope to get a good result tomorrow and i'll uploade my test result here to fix the bug with you ,thanks for your quick reply^^.

ehion commented 5 years ago

I trained Imagenet using 32 GPUs via horovod (8V100*4) but got acc 77.1% which was much less than 78.6% reported in your paper by running: python train.py -c confs/resnet50_imagenet_b4096.yaml --aug fa_reduced_imagenet --horovod Moreover,as your yaml config,lr type should be multistep(adjust_learning_rate_resnet) which can be seen in train.py,but i saw cosine lr decay adopted during my test via your code. Waiting for your reply ,thx.

I got multistep lr decay acc:7664599999904632%,still much less than 78.6%

ildoonet commented 5 years ago

@ehion Let me verify the code and get back to you (hopefully, next week).

ehion commented 5 years ago

@ehion Let me verify the code and get back to you (hopefully, next week).

Waiting for it ,thanks

ildoonet commented 5 years ago

@ehion Thanks for your contribution. I experimented with the original code, while our team prepare to a paper for neurips 2019. I found some bugs and things to be fixed but the performance will be different with the current README. Top1 and Top5 error rate is 22.4 / 6.4 for now but I will come back to you after double-checking codes and experiments.

ildoonet commented 4 years ago

@JoinWei-PKU Our first reported value was 21.4% but not 22.4% (our final version of paper, neurips.) And we will release the final code for search and retrain. Before we do that, I'm checking all final retrained models performed as good as the paper claims. So within one or two weeks, I will update the code with search and retrain, as well as the checkpoints of the retrained models.

Thanks for your interests.

JoinWei-PKU commented 4 years ago

@JoinWei-PKU Our first reported value was 21.4% but not 22.4% (our final version of paper, neurips.) And we will release the final code for search and retrain. Before we do that, I'm checking all final retrained models performed as good as the paper claims. So within one or two weeks, I will update the code with search and retrain, as well as the checkpoints of the retrained models.

Thanks for your interests.

Thanks for your response. Looking forward to the search and retrain code, which is important to verify the proposed methods and would have a profound impact.

JoinWei-PKU commented 4 years ago

@JoinWei-PKU Our first reported value was 21.4% but not 22.4% (our final version of paper, neurips.) And we will release the final code for search and retrain. Before we do that, I'm checking all final retrained models performed as good as the paper claims. So within one or two weeks, I will update the code with search and retrain, as well as the checkpoints of the retrained models. Thanks for your interests.

Thanks for your response. Looking forward to the search and retrain code, which is important to verify the proposed methods and would have a profound impact.

@ildoonet Hi, I quickly review your provided retrain code. The structure of your project is clear, however why there are nearly 500 chosen sub-policies for cifar10? There are only 25 sub-policies for autoaugment.

Looking forward to your reply.

JoinWei-PKU commented 4 years ago

@ildoonet Thanks for releasing the search code. However, there are still obvious bugs in your code, e.g, if you run your search code directly, the searched policy is a set type, which denotes there are augment policy for each class. However, in your data.py, you write the "isinstance(C.get['aug'],list)". Therefore, the results reported in your paper is really true? or just your released code has bugs? Moreover, the random search policy seems achieve equally good performance as your work in Cifar10.

What's more, why you utilize 15 policy for each class? in your paper, there are only 10 policy in total.

@ehion do you have successfully reproduced the results?

ehion commented 4 years ago

@JoinWei-PKU I haven't but i believe that the results in the paper is true becasuse i have implemented a sample based search methods different to the FastAugment(network inference only once similar to FA),my results in Imagenet is a bit better than FastAugment and i find the search space in AutoAugment is of great importance.

JoinWei-PKU commented 4 years ago

@JoinWei-PKU I haven't but i believe that the results in the paper is true becasuse i have reproduced a sample based methods similar to the FastAugment(network inference only once),my results in Imagenet just a bit better than FastAugment.So you can go to test it and you may find the search space in AutoAugment is of great importance .

@ehion Thanks for your response. Agree with that the search space is important. When I utilize the random search policy to search the policy in the Autoaugment search space, the results show a little better than reported performance in Autoaugment paper. Do you think these methods achieve constantly better performance than random search?

ehion commented 4 years ago

@JoinWei-PKU I haven't but i believe that the results in the paper is true becasuse i have reproduced a sample based methods similar to the FastAugment(network inference only once),my results in Imagenet just a bit better than FastAugment.So you can go to test it and you may find the search space in AutoAugment is of great importance .

@ehion Thanks for your response. Agree with that the search space is important. When I utilize the random search policy to search the policy in the Autoaugment search space, the results show a little better than reported performance in Autoaugment paper. Do you think these methods achieve constantly better performance than random search?

I think it constantly better than average random search result but not random results.I think Augment methods really make senses only when it achieves better results in smaller models rather than big models like Resnet50.

JoinWei-PKU commented 4 years ago

@JoinWei-PKU I haven't but i believe that the results in the paper is true becasuse i have reproduced a sample based methods similar to the FastAugment(network inference only once),my results in Imagenet just a bit better than FastAugment.So you can go to test it and you may find the search space in AutoAugment is of great importance .

@ehion Thanks for your response. Agree with that the search space is important. When I utilize the random search policy to search the policy in the Autoaugment search space, the results show a little better than reported performance in Autoaugment paper. Do you think these methods achieve constantly better performance than random search?

I think it constantly better than average random search result but not random results.I think Augment methods really make senses only when it achieves better results in smaller models rather than big models like Resnet50.

@ehion Thanks, However, compared with Autoaugment, I achieve nearly equal performance with average random search on Cifar10 and the models are WResnet-40-2 and WResnet-20-8. Do you achieve better performance with this search policy?

ildoonet commented 4 years ago

We confirmed all models are successfully trained with our latest codes. We also uploaded trained models.

Also we have conducted experiments with randomly-selected augmentations, we found that sometimes it performs well. But it is slightly worse than our approach.

I guess that naively-random selection of augmentations can be improved by simple way, which is called RandAugment : https://github.com/ildoonet/pytorch-randaugment

monkeyDemon commented 4 years ago

We confirmed all models are successfully trained with our latest codes. We also uploaded trained models.

Also we have conducted experiments with randomly-selected augmentations, we found that sometimes it performs well. But it is slightly worse than our approach.

I guess that naively-random selection of augmentations can be improved by simple way, which is called RandAugment : https://github.com/ildoonet/pytorch-randaugment

Thank you for your continuous follow-up on this issue. I see that you have discussed which method is better above, a randomly search strategy or Autoaugment. It seems that Autoaugment is slightly better, maybe. I want to find out if these method can help me improve model's accuracy of my project. So what I'm more concerned about is whether Autoaugment will work better than a base data-augment strategy(such as combination of crop, flip, rotate and so on, probability and intensity designed by experience). As mentioned in the paper: "Our search method is significantly faster than AutoAugment, and its performances overwhelm the human-crafted augmentation methods." Is this conclusion widely verified by experiments? Looking forward to your reply and guidance, these experiences may help me save a lot of time.