This repository contains the source code for our paper:
Factorized Contrastive Learning: Going Beyond Multi-view Redundancy
Paul Pu Liang, Zihao Deng, Martin Ma*, James Zou, Louis-Philippe Morency, Ruslan Salakhutdinov
NeurIPS 2023
If you find this repository useful, please cite our paper:
@inproceedings{liang2023factorized,
title={Factorized Contrastive Learning: Going Beyond Multi-view Redundancy},
author={Liang, Paul Pu and Deng, Zihao and Ma, Martin and Zou, James and Morency, Louis-Philippe and Salakhutdinov, Ruslan},
booktitle={Advances in Neural Information Processing Systems},
year={2023}
}
Factorized Contrastive Learning (FactorCL) is a new multimodal representation learning method to go beyond multi-view redundancy. It factorizes task-relevant information into shared and unique representations and captures task-relevant information via maximizing MI lower bounds and removing task-irrelevant information via minimizing MI upper bounds.
We first compare our proposed NCE-CLUB estimator to the InfoNCE and CLUB estimators on a toy Gaussian dataset for MI estimation.
All objectives (InfoNCE, CLUB, NCE-CLUB) and the critic models are implemented in critic_objectives.py
.
Please follow the steps in the notebook Gaussian_MI_Est/NCE_CLUB_Gaussian.ipynb
to get a demonstration of the estimation quality of each estimator.
We perform experiments on data with controllable ratios of task-relevant shared and unique information. The synthetic data allows us to investigate the performance of each objective under different conditions of shared information.
The synthetic dataset and generation process are implemented in Synthetic/dataset.py
.
Follow steps in the notebook Synthetic/synthetic_example.ipynb
to generate synthetic data with customized shared information and run FactorCL/SimCLR/SupCon on the generated data. The implementations for SimCLR and SupCon are adapted from here.
We evaluate the performance of our proposed FactorCL objective on a suite of datasets from Multibench with different shared and unique information.
You can find examples of running the model in the notebook Multibench/multibench_example.ipynb
.
We used encoders and preprocessed features provided by the Multibench repository. You can also train the model using raw data and other encoders designs.
IRFL (Image Recognition of Figurative Language) is a dataset for examining vision and language models' abilities on figurative languages. In our experiment the model is evaluated for the task of predicting the type of figurative language, including idiom, metaphor, and simile.
For this dataset we compare performances of different objectives on pretrained CLIP image and text encoders. The encoder weights are continually pretrained using each objective.
Follow steps in the notebook IRFL/IRFL_example.ipynb
to process the IRFL data and train using CLIP models.