Code release of paper:
Jian Hu*, Jiayi Lin*, Weitong Cai, Shaogang Gong
The Segment Anything Model (SAM) shows remarkable segmentation ability with sparse prompts like points. However, manual prompt is not always feasible, as it may not be accessible in real-world application. In this work, we aim to eliminate the need for manual prompt.The key idea is to employ Cross-modal Chains of Thought Prompting (CCTP) to reason visual prompts using the semantic information given by a generic text prompt. We introduce a test-time adaptation per-instance mechanism called Generalizable SAM (GenSAM) to automatically enerate and optimize visual prompts the generic task prompt.
A brief introduction of how we GenSAM do!
CCTP maps a single generic text prompt onto image-specific consensus foreground and background heatmaps using vision-language models, acquiring reliable visual prompts. Moreover, to test-time adapt the visual prompts, we further propose Progressive Mask Generation (PMG) to iteratively reweight the input image, guiding the model to focus on the targets in a coarse-to-fine manner.Crucially, all network parameters are fixed, avoiding the need for additional training.Experiments demonstrate the superiority of GenSAM. Experiments on three benchmarks demonstrate that GenSAM outperforms point supervision approaches and achieves comparable results to scribble supervision ones, solely relying on general task descriptions as prompts.
Camouflaged Object Detection Dataset
# create virtual environment
virtualenv GenSAM_LLaVA
source GenSAM_LLaVA/bin/activate
# prepare LLaVA
git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA
pip install -e .
cd ..
# prepare SAM
pip install git+https://github.com/facebookresearch/segment-anything.git
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
pip install opencv-python imageio ftfy urllib3==1.26.6
python main.py --config config/CHAMELEON_LLaVA1.5.yaml ###LLaVA1.5
python main.py --config config/CHAMELEON_LLaVA.yaml ###LLaVA
if you want to visualize the output picture during test-time adaptation, you can running:
python main.py --config config/CHAMELEON_LLaVA1.5.yaml --visualization ###LLaVA1.5
python main.py --config config/CHAMELEON_LLaVA.yaml --visualization ###LLaVA
We further prepare a jupyter notebook demo for visualization.
pip install notebook
pip install ipykernel ipywidgets
python -m ipykernel install --user --name GenSAM_LLaVA
If you find our work useful in your research, please consider citing:
@inproceedings{hu2024relax,
title={Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects},
author={Hu, Jian and Lin, Jiayi and Gong, Shaogang and Cai, Weitong},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={38},
number={11},
pages={12511--12518},
year={2024}
}