ylingfeng / FGVP

Official Codes for Fine-Grained Visual Prompting, NeurIPS 2023
36 stars 2 forks source link

FGVP: Fine-Grained Visual Prompting

Official Codes for Fine-Grained Visual Prompting, NeurIPS 2023

Install

Our code is built upon ReClip. The installation instructions and the preparation of datasets are the same as the ReClip repository.

FGVP

A Summary of visual prompts with the caption "elephant on the left".


Results

Method VLM Visual Prompt Post Processing Command RefCOCO val RefCOCO+ val RefCOCOg val
CPT-adapted ViT-B/32, RN50x16 $B2$ R link 41.3 41.3 51.3
ReCLIP ViT-B/32, RN50x16 $P{\ | \ }B4$ R link 45.8 47.9 59.3
RedCircle ViT-B/32, RN50x16 $P{\ | \ }C1$ R link 43.9 45.3 57.3
FGVP (ours) ViT-B/32, RN50x16 $P{\ | \ }D4$ R link 52.0 53.3 62.1
RedCircle (reported in paper) ViT-L/14@336px, RN50x16 $C1{\ | \ }C3{\ | \ }C4$ S -- 49.8 55.3 59.4
RedCircle ViT-L/14@336px, RN50x16 $C1{\ | \ }C3{\ | \ }C4$ S link 51.4 56.3 58.3
FGVP (ours) ViT-L/14@336px, RN50x16 $D1{\ | \ }D3{\ | \ }D4$ S link 52.9 57.4 58.1
RedCircle ViT-L/14@336px, RN50x16 $P{\ | \ }C1{\ | \ }C3{\ | \ }C4$ S link 51.6 58.1 60.0
FGVP (ours) ViT-L/14@336px, RN50x16 $P{\ | \ }D1{\ | \ }D3{\ | \ }D4$ S link 53.9 59.3 61.0
RedCircle ViT-L/14@336px, RN50x16 $P{\ | \ }C1{\ | \ }C3{\ | \ }C4$ RS link 56.8 58.6 62.2
FGVP (ours) ViT-L/14@336px, RN50x16 $P{\ | \ }D1{\ | \ }D3{\ | \ }D4$ RS link 59.6 60.0 63.3

Inference Single Image

We simply offer an inference script for a single image without post-processing.

# example 1
python fgvp-reclip/simple_inference.py \
    --img_dir demo/exp1/ori.png \
    --text 'apple on the left' 'apple in the middle' 'broccoli' 'raspberries' 'grossum' 'glass bowl' \
    --out_dir demo/exp1 \
    --sam_prompt grid

# example 2
python fgvp-reclip/simple_inference.py \
    --img_dir demo/exp2/ori.png \
    --out_dir demo/exp2 \
    --text 'photo on the wall' \
    --sam_prompt grid

You can provide proposal boxes derived from other detectors to achieve better localization. Save your bounding boxes in a JSON file and specify it with --candidate_boxes.

# example
python fgvp-reclip/simple_inference.py \
    --img_dir demo/exp1/ori.png \
    --text 'apple on the left' 'apple in the middle' 'broccoli' 'raspberries' 'grossum' 'glass bowl' \
    --out_dir demo/exp1 \
    --sam_prompt box \
    --candidate_boxes demo/exp1/candidate_boxes.json