Implementation of the paper Text Augmented Spatial-aware Zero-shot Referring Image Segmentation (EMNLP Findings 2023)
Download Dataset (RefCOCO, RefCOCO+, RefCOCOg) and put in "../refer"
Prepare SAM-H, CLIP and BLIP-2 model
Prepare captions for images (Using BLIP-2)
Install the environment requirements (pip install -r requirements.txt). For syntactic parsing tools, you need to manually install some extension (en-core-web-trf in spacy, wordnet in nltk)
python tas_main.py --config config/refcoco/refcoco_val.json
The repo is derived from the Grounded Segment Anything project.
If you have question, feel free to drop me an e-mail