Code of CVPR 2024 paper: Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation.
This Project heavily relies on the [AFA] and [CLIP-ES]. Many thanks for their great work!
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
tar –xvf VOCtrainval_11-May-2012.tar
The augmented annotations are from SBD dataset. Here is a download link of the augmented annotations at
DropBox. After downloading SegmentationClassAug.zip
, you should unzip it and move it to VOCdevkit/VOC2012
. The directory sctructure should thus be
VOCdevkit/
└── VOC2012
├── Annotations
├── ImageSets
├── JPEGImages
├── SegmentationClass
├── SegmentationClassAug
└── SegmentationObject
wget http://images.cocodataset.org/zips/train2014.zip
wget http://images.cocodataset.org/zips/val2014.zip
After unzipping the downloaded files, for convenience, I recommand to organizing them in VOC style.
MSCOCO/
├── JPEGImages
│ ├── train
│ └── val
└── SegmentationClass
├── train
└── val
To generate VOC style segmentation labels for COCO dataset, you could use the scripts provided at this repo. Or, just downloading the generated masks from Google Drive.
conda create --name py38 python=3.8
conda activate py38
pip install -r requirments.txt
Download the pre-trained CLIP-VIT/16 weights from the official link.
Then, move this model to pretrained/
.
Three parameters requires to be modified based on your path:
(1) root_dir: your/path/VOCdevkit/VOC2012
or your/path/MSCOCO
(2) name_list_dir: your/path/WeCLIP/datasets/voc
or your/path/WeCLIP/datasets/coco
(3) clip_pretrain_path: your/path/WeCLIP/pretrained/ViT-B-16.pt
For VOC, Modify them in configs/voc_attn_reg.yaml
.
For COCO, Modify them in configs/coco_attn_reg.yaml
.
To start training, just run the following code.
# train on voc
python scripts/dist_clip_voc.py --config your/path/WeCLIP/configs/voc_attn_reg.yaml
# train on coco
python scripts/dist_clip_coco.py --config your/path/WeCLIP/configs/coco_attn_reg.yaml
To inference, first modify the inference model path --model_path
in test_msc_flip_voc
or test_msc_flip_voc
Then, run the following code:
# inference on voc
python test_msc_flip_voc.py --model_path your/inference/model/path/WeCLIP_model_iter_30000.pth
# inference on coco
python test_msc_flip_coco.py --model_path your/inference/model/path/WeCLIP_model_iter_80000.pth
Please kindly cite our paper if you find it's helpful in your work.
@InProceedings{Zhang_2024_CVPR,
author = {Zhang, Bingfeng and Yu, Siyue and Wei, Yunchao and Zhao, Yao and Xiao, Jimin},
title = {Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {3796-3806}
}
Many thanks for AFA: [paper] [Project]
@inproceedings{ru2022learning,
title = {Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers},
author = {Lixiang Ru and Yibing Zhan and Baosheng Yu and Bo Du}
booktitle = {CVPR},
year = {2022},
}
Many thanks for CLIP-ES: [paper] [Project]
@InProceedings{Lin_2023_CVPR,
author = {Lin, Yuqi and Chen, Minghao and Wang, Wenxiao and Wu, Boxi and Li, Ke and Lin, Binbin and Liu, Haifeng and He, Xiaofei},
title = {CLIP Is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {15305-15314}
}