thuml / Self-Tuning

Code release for "Self-Tuning for Data-Efficient Deep Learning" (ICML 2021)
110 stars 14 forks source link

Self-Tuning for Data-Efficient Deep Learning

This repository contains the implementation code for paper:
Self-Tuning for Data-Efficient Deep Learning
Ximei Wang, Jinghan Gao, Mingsheng Long, Jianmin Wang
38th International Conference on Machine Learning (ICML 2021)
[Project Page] [Paper] [Video] [Slide] [Poster] [Blog] [Zhihu] [SlidesLive]


Brief Introduction for Data-Efficient Deep Learning

Mitigating the requirement for labeled data is a vital issue in deep learning community. However, common practices of TL and SSL only focus on either the pre-trained model or unlabeled data. This paper unleashes the power of both worlds by proposing a new setup named data-efficient deep learning, aims to mitigate the requirement of labeled data by unifying the exploration of labeled and unlabeled data and the transfer of pre-trained model.

To address the challenge of confirmation bias in self-training, a general Pseudo Group Contrast mechanism is devised to mitigate the reliance on pseudo-labels and boost the tolerance to false labels. To tackle the model shift problem, we unify the exploration of labeled and unlabeled data and the transfer of a pre-trained model, with a shared key queue beyond just 'parallel training'. Comprehensive experiments demonstrate that Self-Tuning outperforms its SSL and TL counterparts on five tasks by sharp margins, e.g., it doubles the accuracy of fine-tuning on Stanford-Cars provided with 15% labels.

Dependencies

Datasets

Dataset Download Link
CUB-200-2011 http://www.vision.caltech.edu/visipedia/CUB-200-2011.html
Stanford Cars http://ai.stanford.edu/~jkrause/cars/car_dataset.html
FGVC Aircraft http://www.robots.ox.ac.uk/~vgg/data/fgvc-aircraft/
Cifar100 https://www.cs.toronto.edu/~kriz/cifar.html

Disclaimer on Datasets

This open-sourced code will download and prepare public datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have licenses to use the dataset. It is your responsibility to determine whether you have permission to use the dataset under the dataset's license.

If you're a dataset owner and wish to update any part of it (description, citation, etc.), or do not want your dataset to be included in this code, please get in touch with us through a GitHub issue. Thanks for your contribution to the ML community!

Quick Start

## Tensorboard Log
| Dataset | Label Ratio 1 | Label Ratio 2 | Label Ratio 3 |
| -- | -- | -- | -- |
| CUB-200-2011 | [15%](http://github.com/thuml/Self-Tuning/blob/master/vis/CUB200_15.png) | [30%](http://github.com/thuml/Self-Tuning/blob/master/vis/CUB200_30.png) | [50%](http://github.com/thuml/Self-Tuning/blob/master/vis/CUB200_50.png) |
| Stanford Cars  | [15%](http://github.com/thuml/Self-Tuning/blob/master/vis/StanfordCars_15.png) | [30%](http://github.com/thuml/Self-Tuning/blob/master/vis/StanfordCars_30.png) | [50%](http://github.com/thuml/Self-Tuning/blob/master/vis/StanfordCars_50.png) |
| FGVC Aircraft  | [15%](http://github.com/thuml/Self-Tuning/blob/master/vis/Aircraft_15.png) | [30%](http://github.com/thuml/Self-Tuning/blob/master/vis/Aircraft_30.png) | [50%](http://github.com/thuml/Self-Tuning/blob/master/vis/Aircraft_50.png) |
| Cifar100  | [400](http://github.com/thuml/Self-Tuning/blob/master/vis/Cifar100_400.png) | [2500](http://github.com/thuml/Self-Tuning/blob/master/vis/Cifar100_2500.png) | [10000](http://github.com/thuml/Self-Tuning/blob/master/vis/Cifar100_10000.png) |

- We achieved better results than that reported in the paper, after fixing some small bugs of the code.

## Updates
- [07/2021] We have created a [Blog post](https://mp.weixin.qq.com/s/H4xlndTZtWuXHni-vOC_vQ) in Chinese for this work. Check it out for more details!
- [07/2021] We have released the code and models. You can find all reproduced checkpoints via [this link](https://cloud.tsinghua.edu.cn/d/4e8fb444c4634e76ab0a/).
- [06/2021] A five minute [video](https://icml.cc/virtual/2021/spotlight/8616) is released to briefly introduce the main idea of Self-Tuning. 
- [05/2021] Paper accepted to [ICML 2021](https://icml.cc/Conferences/2021/Schedule?type=Poster) as a __Short Talk__. 
- [02/2021] [arXiv version](https://arxiv.org/abs/2102.12903) posted. Please stay tuned for updates.

## Citation
If you find this code or idea useful, please cite our work:
```bib
@inproceedings{wang2021selftuning,
  title={Self-Tuning for Data-Efficient Deep Learning},
  author={Wang, Ximei and Gao, Jinghan and Long, Mingsheng and Wang, Jianmin},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2021}
}

Contact

If you have any questions, feel free to contact us through email (wxm17@mails.tsinghua.edu.cn) or Github issues. Enjoy!