This repo contains official Pytorch implementations of the paper: MV–MR: Multi-Views and Multi-Representations for Self-Supervised Learning and Knowledge Distillation
We present a new method of self-supervised learning and knowledge distillation based on multi-views and multi-representations (MV–MR). MV–MR is based on the maximization of dependence between learnable embeddings from augmented and non-augmented views, jointly with the maximization of dependence between learnable embeddings from the augmented view and multiple non-learnable representations from the non-augmented view. We show that the proposed method can be used for efficient self-supervised classification and model-agnostic knowledge distillation. Unlike other self-supervised techniques, our approach does not use any contrastive learning, clustering, or stop gradients. MV–MR is a generic framework allowing the incorporation of constraints on the learnable embeddings via the usage of image multi-representations as regularizers. The proposed method is used for knowledge distillation. MV–MR provides state-of-the-art self-supervised performance on the STL10 and CIFAR20 datasets in a linear evaluation setup. We show that a low-complexity ResNet50 model pretrained using proposed knowledge distillation based on the CLIP ViT model achieves state-of-the-art performance on STL10 and CIFAR100 datasets.
conda env create -f environment.yml
To run the training of the self-supervised model, first fill the config file. Examples of the config files:
configs/stl10_self_supervised.yaml
, configs/imagenet_self_supervised.yaml
.
Then run
python main_self_supervised.py --config <path to config file>
If you want to automatically select the batch size, add --auto_bs
flag. If you want to automatically select learning
rate, add --auto_lr
flag.
To run the training of the semi-supervised model (fine-tuning the pretrained self-supervised model), first fill the
config file. Examples of the config files: configs/stl10_semi_supervised.yaml
,
configs/imagenet_semi_supervised.yaml
.
Then run
python main_semi_supervised \
--config <path to semi-supervised config> \
--path_ckpt <path to pretrained self-supervised model>
If you want to automatically select the batch size, add --auto_bs
flag. If you want to automatically select learning
rate, add --auto_lr
flag.
To run the self-supervised distillation of CLIP into ResNet50, first fill the config file. Example of the
config file: configs/imagenet_clip_self_supervised.yaml
.
python main_clip_self_supervised.py --config <path to the config>
If you want to automatically select the batch size, add --auto_bs
flag. If you want to automatically select learning
rate, add --auto_lr
flag.
Multiclass classification on VOC07 is one of the ways to evaluate the pretrained self-supervised models. The idea is to train liner model on top of the frozen embeddings from pretrained encoder.
To run the training of multiclass classification model on VOC07 dataset, first fill the config file. Example of
config file: configs/imagenet_voc.yaml
.
Then run
python main_voc.py --config_voc <path to VOC config> \
--config_self <path to self-supervised config> \
--path_self <path to pretrained self-supervised model>
If you want to automatically select the batch size, add --auto_bs
flag. If you want to automatically select learning
rate, add --auto_lr
flag.
Self-supervised model evaluation follows linear evaluation protocol: linear classifier is trained on top of frozen embeddings from the pretrained encoder. Script will process validation set and display Top-1 and Top-5 accuracies.
To run self-supervised model evaluation:
python evaluate_self_supervised.py --config <path to self-supervised config> \
--ckpt <path to model to evaluate> \
--epochs <number of epochs to filetune>
By default it will evaluate the linear classifier (called online finetuner), that is trained alongside the self-supervised
encoder, If you want to retrain the linear classifier from scratch, add --retrain
flag. Retraining might take some
time, but it generally provides higher accuracy.
Semi-supervised model evaluation simply loads model and processes validation set.
To run semi-supervised model evaluation:
python evaluate_semi_supervised.py --config <path to semi-supervised config> --ckpt <path to train semi-supervised model>
To run the evaluation of the ResNet50 distilled from CLIP, simply run self-supervised model avaluation:
python evaluate_self_supervised.py --config <path to the config> \
--ckpt <path to model to evaluate> \
--epochs <number of epochs to filetune>
By default it will evaluate the linear classifier (called online finetuner), that is trained alongside the encoder.
If you want to retrain the linear classifier from scratch, add --retrain
flag. Retraining might take some
time, but it generally provides higher accuracy.
To run multiclass classification evaluation on VOC07:
python evaluate_voc.py --config <path to VOC config> --ckpt <path to model trained on VOC>
See scripts/
folder.
Dataset | Top-1 accuracy | Top-5 accuracy | Download link |
---|---|---|---|
STL10 | 89.67% | 99.46% | Download |
ImageNet-1K | 74.5% | 92.1% | Download |
CIFAR20 | 73.2% | 95.6% | Download |
Dataset | Top-1 accuracy | Top-5 accuracy | Percentage of labels | Download link |
---|---|---|---|---|
ImageNet-1K | 56.1% | 79.4% | 1% | Download |
ImageNet-1K | 69.9% | 89.5% | 10% | Download |
Pretrain dataset | Finetune dataset | mAP | Download link |
---|---|---|---|
ImageNet-1k | VOC2007 | 87.1 | Download |
Dataset | Top-1 accuracy | Download link |
---|---|---|
ImageNet-1K | 75.3% | Download |
STL10 | 95.6% | Download |
CIFAR100 | 78.6% | Download |
@article{kinakh2024mv,
title={MV--MR: Multi-Views and Multi-Representations for Self-Supervised Learning and Knowledge Distillation},
author={Kinakh, Vitaliy and Drozdova, Mariia and Voloshynovskiy, Slava},
journal={Entropy},
volume={26},
number={6},
pages={466},
year={2024},
publisher={MDPI}
}