A collection of parameter-efficient transfer learning papers focusing on computer vision and multimodal domains.
Pre-training, then fully fine-tuning is a long standing paradigm in deep learning. However, as pre-trained models are scaling up, e.g. GPT-3(175B params), fully fine-tuning them on various downstream tasks has a high risk of overfitting. Moreover, in practice, it would be costly to train and store a large model for each task. To overcome the above issues, researchers started to explore Parameter-Efficient Transfer Learning which aims at adapting large-scale pre-trained model to various downstream tasks by modifying as less parameter as possible. Inspired by the great advances in NLP domain and the continuous trend of scaling up models, scholars in computer vision and multimodal domains also join the research craze.
We follow the general idea of PromptPapers to label the papers.
The abbreviation of the work.
The main explored task of the work.
Other important information of the work.
Learning to Prompt for Vision-Language Models, IJCV 2022 (arXiv:2109.01134).
Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu. [Paper][Code]
Prompting Visual-Language Models for Efficient Video Understanding, ECCV 2022 (arXiv:2112.04478).
Chen Ju, Tengda Han, Kunhao Zheng, Ya Zhang, Weidi Xie. [Paper][Code]
Domain Adaptation via Prompt Learning, arXiv: arXiv:2202.06687.
Chunjiang Ge, Rui Huang, Mixue Xie, Zihang Lai, Shiji Song, Shuang Li, Gao Huang. [Paper][Code]
Conditional Prompt Learning for Vision-Language Models, CVPR 2022 (arXiv:2203.05557).
Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu. [Paper][Code]
Visual Prompt Tuning, ECCV 2022 (arXiv:2203.12119).
Menglin Jia, Luming Tang, Bor-Chun Chen, Claire Cardie, Serge Belongie, Bharath Hariharan, Ser-Nam Lim. [Paper][Code]
Exploring Visual Prompts for Adapting Large-Scale Models, arXiv:2203.17274.
Hyojin Bahng, Ali Jahanian, Swami Sankaranarayanan, Phillip Isola. [Paper][Code]
Pro-tuning: Unified Prompt Tuning for Vision Tasks, arXiv:2207.14381.
Xing Nie, Bolin Ni, Jianlong Chang, Gaomeng Meng, Chunlei Huo, Zhaoxiang Zhang, Shiming Xiang, Qi Tian, Chunhong Pan. [Paper][Code]
P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting, arXiv:2208.02812.
Ziyi Wang, Xumin Yu, Yongming Rao, Jie Zhou, Jiwen Lu. [Paper][Code]
Class-Aware Visual Prompt Tuning for Vision-Language Pre-Trained Model, arXiv:2208.08340.
Yinghui Xing, Qirui Wu, De Cheng, Shizhou Zhang, Guoqiang Liang, Yanning Zhang. [Paper][Code]
Prompt Tuning with Soft Context Sharing for Vision-Language Models, arXiv:2208.13474.
Kun Ding, Ying Wang, Pengzhang Liu, Qiang Yu, Haojian Zhang, Shiming Xiang, Chunhong Pan. [Paper][Code]
Language-Aware Soft Prompting for Vision & Language Foundation Models, arXiv:2210.01115.
Adrian Bulat, Georgios Tzimiropoulos. [Paper][Code]
Prompt Learning with Optimal Transport for Vision-Language Models, arXiv:2210.01253.
Guangyi Chen, Weiran Yao, Xiangchen Song, Xinyue Li, Yongming Rao, Kun Zhang. [Paper][Code]
MaPLe: Multi-modal Prompt Learning, arXiv:2210.03117.
Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan. [Paper][Code]
Unified Vision and Language Prompt Learning, arXiv:2210.07225.
Yuhang Zang, Wei Li, Kaiyang Zhou, Chen Huang, Chen Change Loy. [Paper][Code]
CPL: Counterfactual Prompt Learning for Vision and Language Models, arXiv:2210.10362.
Xuehai He, Diji Yang, Weixi Feng, Tsu-Jui Fu, Arjun Akula, Varun Jampani, Pradyumna Narayana, Sugato Basu, William Yang Wang, Xin Eric Wang. [Paper][Code]
Understanding and Improving Visual Prompting: A Label-Mapping Perspective, arXiv:2211.11635.
Aochuan Chen, Yuguang Yao, Pin-Yu Chen, Yihua Zhang, Sijia Liu. [Paper][Code]
Texts as Images in Prompt Tuning for Multi-Label Image Recognition, arXiv:2211.12739.
Zixian Guo, Bowen Dong, Zhilong Ji, Jinfeng Bai, Yiwen Guo, Wangmeng Zuo. [Paper][Code]
VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval, arXiv:2211.12764.
Siteng Huang, Biao Gong, Yulin Pan, Jianwen Jiang, Yiliang Lv, Yuyuan Li, Donglin Wang. [Paper][Code]
Unleashing the Power of Visual Prompting At the Pixel Level, arXiv:2212.10556.
Junyang Wu, Xianhang Li, Chen Wei, Huiyu Wang, Alan Yuille, Yuyin Zhou, Cihang Xie. [Paper][Code]
Self-Supervised Convolutional Visual Prompts, arXiv:2303.00198.
Yun-Yun Tsai, Chengzhi Mao, Yow-Kuan Lin, Junfeng Yang. [Paper][Code]
Yi-Lun Lee, Yi-Hsuan Tsai, Wei-Chen Chiu, Chen-Yu Lee. [Paper][Code]
Ziqing Yang, Zeyang Sha, Michael Backes, Yang Zhang. [Paper][Code]
Qidong Huang, Xiaoyi Dong, Dongdong Chen, Weiming Zhang, Feifei Wang, Gang Hua, Nenghai Yu. [Paper][Code]
Xinyang Liu, Dongsheng Wang, Miaoge Li, Zhibin Duan, Yishi Xu, Bo Chen, Mingyuan Zhou. [Paper][Code]
Haixin Wang, Jianlong Chang, Xiao Luo, Jinan Sun, Zhouchen Lin, Qi Tian. [Paper][Code]
Hao Zhang, Basura Fernando. [Paper][Code]
Jiawen Zhu, Simiao Lai, Xin Chen, Dong Wang, Huchuan Lu. [Paper][Code]
Weihuang Liu, Xi Shen, Chi-Man Pun, Xiaodong Cun. [Paper][Code]
Deepti Hegde, Jeya Maria Jose Valanarasu, Vishal M. Patel. [Paper][Code]
Comments: This works' idea is similar to our Text4Point.
Chen Ju, Zeqian Li, Peisen Zhao, Ya Zhang, Xiaopeng Zhang, Qi Tian, Yanfeng Wang, Weidi Xie. [Paper][Code]
Highlight: Enrich the meaning of an action class by querying the large-scale language model to give a detailed action description.
Siteng Huang, Biao Gong, Yutong Feng, Yiliang Lv, Donglin Wang. [Paper][Code]
Renrui Zhang, Jiaming Han, Aojun Zhou, Xiangfei Hu, Shilin Yan, Pan Lu, Hongsheng Li, Peng Gao, Yu Qiao. [Paper][Code]
Highlight: Tuning the LLaMA(7B Params) to an excellent ChatBot with only 1.2M trainable parameters and 1 hour fine-tuning.
Hyeongjun Kwon, Taeyong Song, Somi Jeong, Jin Kim, Jinhyun Jang, Kwanghoon Sohn. [Paper]
Jiayi Guo, Chaofei Wang, You Wu, Eric Zhang, Kai Wang, Xingqian Xu, Shiji Song, Humphrey Shi, Gao Huang. [Paper][Code]
Yaohua Zha, Jinpeng Wang, Tao Dai, Bin Chen, Zhi Wang, Shu-Tao Xia. [Paper][Code]
Zhao Song, Ke Yang, Naiyang Guan, Junjie Zhu, Peng Qiao, Qingyong Hu. [Paper][Code]
Qiong Wu, Shubin Huang, Yiyi Zhou, Pingyang Dai, Annan Shu, Guannan Jiang, Rongrong Ji. [Paper][Code]
Haixin Wang, Xinlong Yang, Jianlong Chang, Dian Jin, Jinan Sun, Shikun Zhang, Xiao Luo, Qi Tian. [Paper][Code]
Yajing Liu, Yuning Lu, Hao Liu, Yaozu An, Zhuoran Xu, Zhuokun Yao. [Paper][Code]
Haoqing Wang, Shibo Jie, Zhi-Hong Deng. [Paper][Code]
Xiangpeng Yang, Linchao Zhu, Xiaohan Wang, Yi Yang. [Paper][Code]
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks, CVPR 2022 (arXiv:2112.06825).
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition, NeurIPS 2022 (arXiv:2205.13535).
Shoufa Chen, Chongjian Ge, Zhan Tong, Jiangliu Wang, Yibing Song, Jue Wang, Ping Luo. [Paper][Code]
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models, NeurIPS 2022 (arXiv:2206.08155).
Antoine Yang, Antoine Miech, Josef Sivic, Ivan Laptev, Cordelia Schmid. [Paper][Code]
ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning, NeurIPS 2022 (arXiv:2206.13559).
Junting Pan, Ziyi Lin, Xiatian Zhu, Jing Shao, Hongsheng Li. [Paper][Code]
Convolutional Bypasses Are Better Vision Transformer Adapters, arXiv:2207.07039.
Conv-Adapter: Exploring Parameter Efficient Transfer Learning for ConvNets, arXiv:2208.07463.
Hao Chen, Ran Tao, Han Zhang, Yidong Wang, Wei Ye, Jindong Wang, Guosheng Hu, Marios Savvides. [Paper][Code]
Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving, NeurIPS 2022 (arXiv:2209.08953).
Xiwen Liang, Yangxin Wu, Jianhua Han, Hang Xu, Chunjing Xu, Xiaodan Liang. [Paper][Code]
Polyhistor: Parameter-Efficient Multi-Task Adaptation for Dense Vision Tasks, NeurIPS 2022 (arXiv:2210.03265).
Yen-Cheng Liu, Chih-Yao Ma, Junjiao Tian, Zijian He, Zsolt Kira. [Paper][Code]
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models, arXiv:2210.03794.
Omiros Pantazis, Gabriel Brostow, Kate Jones, Oisin Mac Aodha. [Paper][Code]
Cross-Modal Adapter for Text-Video Retrieval, arXiv:2211.09623.
Haojun Jiang, Jianke Zhang, Rui Huang, Chunjiang Ge, Zanlin Ni, Jiwen Lu, Jie Zhou, Shiji Song, Gao Huang. [Paper][Code]
Vision Transformers are Parameter-Efficient Audio-Visual Learners, arXiv:2212.07983.
Yan-Bo Lin, Yi-Lin Sung, Jie Lei, Mohit Bansal, Gedas Bertasius. [Paper][Code]
Take away message: Pre-trained vision transformer can deal with audio data by representing 1D raw audio signal as 2D audio image.
Multimodal Video Adapter for Parameter Efficient Video Text Retrieval, arXiv:2301.07868.
Bowen Zhang, Xiaojie Jin, Weibo Gong, Kai Xu, Zhao Zhang, Peng Wang, Xiaohui Shen, Jiashi Feng. [Paper][Code]
AIM: Adapting Image Models for Efficient Video Action Recognition, ICLR 2023 (arXiv:2302.03024).
Taojiannan Yang, Yi Zhu, Yusheng Xie, Aston Zhang, Chen Chen, Mu Li. [Paper][Code]
Offsite-Tuning: Transfer Learning without Full Model, arXiv:2302.04870.
UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling, arXiv:2302.06605.
Haoyu Lu, Mingyu Ding, Yuqi Huo, Guoxing Yang, Zhiwu Lu, Masayoshi Tomizuka, Wei Zhan. [Paper][Code]
Towards Efficient Visual Adaption via Structural Re-parameterization, arXiv:2302.08106.
Gen Luo, Minglang Huang, Yiyi Zhou, Xiaoshuai Sun, Guannan Jiang, Zhiyu Wang, Rongrong Ji. [Paper][Code]
T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models, arXiv:2302.08453.
Chong Mou, Xintao Wang, Liangbin Xie, Jian Zhang, Zhongang Qi, Ying Shan, Xiaohu Qie. [Paper][Code]
kNN-Adapter: Efficient Domain Adaptation for Black-Box Language Models, arXiv:2302.10879.
Yangsibo Huang, Daogao Liu, Zexuan Zhong, Weijia Shi, Yin Tat Lee. [Paper][Code]
Side Adapter Network for Open-Vocabulary Semantic Segmentation, arXiv:2302.12242.
Mengde Xu, Zheng Zhang, Fangyun Wei, Han Hu, Xiang Bai. [Paper][Code]
Jungin Park, Jiyoung Lee, Kwanghoon Sohn. [Paper][Code]
Highlight: Modeling temporal information in a separate path.
Zaid Khan, Yun Fu. [Paper][Code]
Highlight: Aligning an already-trained vision and language model with adapter.
Chendong Xiang, Fan Bao, Chongxuan Li, Hang Su, Jun Zhu. [Paper]
Shwai He, Liang Ding, Daize Dong, Miao Zhang, Dacheng Tao. [Paper][Code]
Xuehai He, Chunyuan Li, Pengchuan Zhang, Jianwei Yang, Xin Eric Wang. [Paper][Code]
Binjie Zhang, Yixiao Ge, Xuyuan Xu, Ying Shan, Mike Zheng Shou. [Paper][Code]
GraphAdapter: Tuning Vision-Language Models with Dual Knowledge Graph, NeurIPS 2023.
Xin Li, Dongze Lian, Zhihe Lu, Jiawang Bai, Zhibo Chen, Xinchao Wang. [Paper][Code]
Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis, CVPR 2024.
Xin Zhou , Dingkang Liang , Wei Xu, Xingkui Zhu ,Yihan Xu, Zhikang Zou, Xiang Bai. [Paper][Code]
Towards a Unified View of Parameter-Efficient Transfer Learning, ICLR 2022 (arXiv:2110.04366).
Junxian He, Chunting Zhou, Xuezhe Ma, Taylor Berg-Kirkpatrick, Graham Neubig. [Paper][Code]
Neural Prompt Search, arXiv:2206.04673.
AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning, arXiv:2301.12132.
Han Zhou, Xingchen Wan, Ivan Vulić, Anna Korhonen. [Paper][Code]
Rethinking Efficient Tuning Methods from a Unified Perspective, arXiv:2303.00690.
Zeyinzi Jiang, Chaojie Mao, Ziyuan Huang, Yiliang Lv, Deli Zhao, Jingren Zhou. [Paper][Code]
VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control, arXiv:2308.09804.
Zi-Yuan Hu, Yanyang Li, Michael R. Lyu, Liwei Wang. [Paper][Code]
Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks, ECCV 2024.
Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens. [Paper] [Code]
Check out thunlp/DeltaPapers if you are interested in the progress of NLP domain.
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning, NeurIPS 2022 (arXiv:2206.06522).
Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning, NeurIPS 2022 (arXiv:2210.08823).
Dongze Lian, Daquan Zhou, Jiashi Feng, Xinchao Wang. [Paper][Code]
FacT: Factor-Tuning for Lightweight Adaptation on Vision Transformer, AAAI 2023 (arXiv:2212.03145).
Important Channel Tuning, Openreview.
Hengyuan Zhao, Pichao WANG, Yuyang Zhao, Fan Wang, Mike Zheng Shou. [Paper][Code]
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering, arXiv:2303.01239.
Jingjing Jiang, Nanning Zheng. [Paper][Code]
Revisit Parameter-Efficient Transfer Learning: A Two-Stage Paradigm, arXiv:2303.07910.
Hengyuan Zhao, Hao Luo, Yuyang Zhao, Pichao Wang, Fan Wang, Mike Zheng Shou. [Paper][Code]
Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning, CVPR 2023 (arXiv2212.03220)
Task Residual for Tuning Vision-Language Models, CVPR 2023.
Tao Yu, Zhihe Lu, Xin Jin, Zhibo Chen, Xinchao Wang. [Paper][Code]
DTL: Disentangled Transfer Learning for Visual Recognition, AAAI 2024 (arXiv:2312.07856).
Minghao Fu, Ke Zhu, Jianxin Wu. [Paper][Code]
Beyond Sole Strength: Customized Ensembles for Generalized Vision-Language Models, ICML 2024.
Zhihe Lu, Jiawang Bai, Xin Li, Zeyu Xiao, Xinchao Wang. [Paper][Code]
The structure of this repository is following thunlp/DeltaPapers which focuses on collecting awesome parameter-efficient transfer learning papers in nature language processing domain. Check out their repository if you are interested in the progress of NLP domain.