About the effect of input mixup to long-tailed learning

mingliangzhang2018 commented 3 years ago

Hi, thank you for providing so many tricks to solve the problem of long-tailed recognition. I'm wondering the baseline with only input mixup on the dataset of CIFAR100 with imbalance ratio 100 could get error rate 59.66(58.21). In my experiments, the error rate is only around 61.0(60.2) according to the average results of multiple experiments.

zhangyongshun commented 3 years ago

Hi, I think the reason could be the training settings, such as the hyper-parameter alpha in mixup, which is set to 1.0 in my experiments, or other details. Some training details on long-tailed CIFAR are stated in the section ``Training details'' in http://www.lamda.nju.edu.cn/zhangys/papers/AAAI_tricks.pdf. Hope the above could help you. Sorry for the unfinished codes. These days, I need to prepare for the final examinations. After finishing them, I will try my best to release the reorganized codes as soon as possible, which are no more than a week!

Best, Yongshun

zhangyongshun commented 3 years ago

Especially, when you apply mixup training, as stated in mixup paper: https://arxiv.org/abs/1710.09412, the training epochs should be double. So the total epoch is set to 400 on long-tailed CIFAR, not 200. And the learning rate should be divided by 100 at the 320th and 360th epoch, respectively.

mingliangzhang2018 commented 3 years ago

@zhangyongshun Thanks very much. I set the same hyper-parameter as your paper except the training epoch which sets as general setting. (The number of training epochs is 200 and the batch size is 128. Learning rate is initialized to 0.1 and divided by 100 at the 160th and 180th epoch, respectively) It is unfair to some extent as we know that the large number of training epoch improves performance.

zhangyongshun commented 3 years ago

Yes, it really does. Mixup is a strong regularization trick, which, on the other hand, needs more epochs to learn better. We provide implementation details of these tricks in the AAAI-21 supplement, which will also be attached with the released codes.

jrcai commented 3 years ago

Hi, where we find the supplemental materials? Thanks.

zhangyongshun commented 3 years ago

Hi, where we find the supplemental materials? Thanks.

Sorry for replying late. You can find the supp at http://www.lamda.nju.edu.cn/zhangys/papers/AAAI_tricks_supp.pdf, and I will update it to README soon.

jrcai commented 3 years ago

Hi, where we find the supplemental materials? Thanks.

Sorry for replying late. You can find the supp at http://www.lamda.nju.edu.cn/zhangys/papers/AAAI_tricks_supp.pdf, and I will update it to README soon.

Thank you!

Also I am curious about a statement in the paper: We can also find that combining CS_CE and CAM-based balance-sampling together cannot further improve the accuracy, since both of them try to enlarge the influence of tail classes and the joint use of the two could cause an accuracy drop due to the overfitting problem. How do you observe the overfitting or this is just a hypothesis? Thanks.

zhangyongshun commented 3 years ago

Hi, where we find the supplemental materials? Thanks.

Sorry for replying late. You can find the supp at http://www.lamda.nju.edu.cn/zhangys/papers/AAAI_tricks_supp.pdf, and I will update it to README soon.

Thank you!

Also I am curious about a statement in the paper: We can also find that combining CS_CE and CAM-based balance-sampling together cannot further improve the accuracy, since both of them try to enlarge the influence of tail classes and the joint use of the two could cause an accuracy drop due to the overfitting problem. How do you observe the overfitting or this is just a hypothesis? Thanks.

Hi, we have observed the training and validation accuracies, as well as the confusion matrices, when using re-weighting (cost-sensitive CE), re-sampling (class-balanced sampling), and both of the two, respectively. We found that if applying re-weighting and re-sampling together, the validation accuracy on tail classes will increase while the validation accuracy on head classes will drop a bit, compared with applying only one of them. On training accuracy, applying both of them will also cause a bit of underfitting on head classes. You can visualise the training and validation accuracies and confusion matrices to observe the above phenomena.

jrcai commented 3 years ago

Make sense, thank you for the clarifications!

zhangyongshun / BagofTricks-LT

About the effect of input mixup to long-tailed learning #1