Follow the process of UPT.
The downloaded files should be placed as follows. Otherwise, please replace the default path to your custom locations.
|- CMMP
| |- hicodet
| | |- hico_20160224_det
| | |- annotations
| | |- images
| |- vcoco
| | |- mscoco2014
| | |- train2014
| | |-val2014
: :
Follow the environment setup in UPT.
Our code is built upon CLIP. Install the local package of CLIP:
cd CLIP && python setup.py develop && cd ..
Download the CLIP weights to checkpoints/pretrained_clip
.
|- CMMP
| |- checkpoints
| | |- pretrained_clip
| | |- ViT-B-16.pt
| | |- ViT-L-14-336px.pt
: :
Download the weights of DETR and put them in checkpoints/
.
Dataset | DETR weights |
---|---|
HICO-DET | weights |
V-COCO | weights |
|- CMMP
| |- checkpoints
| | |- detr-r50-hicodet.pth
| | |- detr-r50-vcoco.pth
: : :
Download the pre-extracted features from HERE and the pre-extracted bboxes from HERE. The downloaded files have to be placed as follows.
|- CMMP
| |- hicodet_pkl_files
| | |- union_embeddings_cachemodel_crop_padding_zeros_vitb16.p
| | |- hicodet_union_embeddings_cachemodel_crop_padding_zeros_vit336.p
| |- vcoco_pkl_files
| | |- vcoco_union_embeddings_cachemodel_crop_padding_zeros_vit16.p
| | |- vcoco_union_embeddings_cachemodel_crop_padding_zeros_vit336.p
: :
Please follow the commands in ./scripts
.
Method | Type | Unseen↑ | Seen↑ | Full↑ | HM↑ |
---|---|---|---|---|---|
CMMP (Ours) | RF-UC | 29.45 | 32.87 | 32.18 | 31.07 |
CMMP† (Ours) | RF-UC | 35.98 | 37.42 | 37.13 | 36.69 |
CMMP (Ours) | NF-UC | 32.09 | 29.71 | 30.18 | 30.85 |
CMMP† (Ours) | NF-UC | 33.52 | 35.53 | 35.13 | 34.50 |
CMMP (Ours) | UO | 33.76 | 31.15 | 31.59 | 32.40 |
CMMP† (Ours) | UO | 39.67 | 36.15 | 36.74 | 37.83 |
CMMP (Ours) | UV | 26.23 | 32.75 | 31.84 | 29.13 |
CMMP† (Ours) | UV | 30.84 | 37.28 | 36.38 | 33.75 |
You can download the model weights from:
Link: https://pan.baidu.com/s/1XyWG2qjEXWghEYcc4-PGFA?pwd=zkh5
Password: zkh5
Or you can download the CMMP weights from huggingface:
https://huggingface.co/lttt/CMMP/tree/main
If you find our paper and/or code helpful, please consider citing:
@article{ting2024CMMP,
title={Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection},
author={Ting Lei and Shaofeng Yin and Yuxin Peng and Yang Liu},
year={2024},
booktitle={ECCV},
organization={IEEE},
}