This is the third party implementation of the paper Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection by [Zuwei Long]() and Wei Li.
You can use this code to fine-tune a model on your own dataset, or start pretraining a model from scratch.
Official release version | The version we replicated | |
---|---|---|
Inference | ✔ | ✔ |
Train (Object Detection data) | ✖ | ✔ |
Train (Grounding data) | ✖ | ✔ |
Slurm multi-machine support | ✖ | ✔ |
Training acceleration strategy | ✖ | ✔ |
We conduct our model testing using the following versions: Python 3.7.11, PyTorch 1.11.0, and CUDA 11.3. It is possible that other versions are also available.
git clone https://github.com/longzw1997/Open-GroundingDino.git && cd Open-GroundingDino/
pip install -r requirements.txt
cd models/GroundingDINO/ops
python setup.py build install
python test.py
cd ../../..
For training, we use the odvg data format to support both OD data and VG data.
Before model training begins, you need to convert your dataset into odvg format, see data_format.md | datasets_mixed_odvg.json | coco2odvg.py | grit2odvg for more details.
For testing, we use coco format, which currently only supports OD datasets.
config/cfg_odvg.py # for backbone, batch size, LR, freeze layers, etc.
config/datasets_mixed_odvg.json # support mixed dataset for both OD and VG
config/datasets_mixed_example.json
according to data_format.md.- use_coco_eval = True
+ use_coco_eval = False
+ label_list=['dog', 'cat', 'person']
# train/eval on torch.distributed.launch:
bash train_dist.sh ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR}
bash test_dist.sh ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR}
# train/eval on slurm cluster:
bash train_slurm.sh ${PARTITION} ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR}
bash test_slurm.sh ${PARTITION} ${GPU_NUM} ${CFG} ${DATASETS} ${OUTPUT_DIR}
# e.g. check train_slurm.sh for more details
# bash train_slurm.sh v100_32g 32 config/cfg_odvg.py config/datasets_mixed_odvg.json ./logs
# bash train_slurm.sh v100_32g 8 config/cfg_coco.py config/datasets_od_example.json ./logs
Name | Pretrain data | Task | mAP on COCO | Ckpt | Misc |
---|---|---|---|---|---|
GroundingDINO-T (offical) |
O365,GoldG,Cap4M | zero-shot | 48.4 (zero-shot) |
model | - | GroundingDINO-T (fine-tune) |
O365,GoldG,Cap4M | finetune w/ coco |
57.3 (fine-tune) |
model | cfg | log |
GroundingDINO-T (pretrain) |
COCO,O365,LIVS,V3Det, GRIT-200K,Flickr30k(total 1.8M) |
zero-shot | 55.1 (zero-shot) |
model | cfg | log |
Because the model architecture has not changed, you only need to install GroundingDINO library and then run inference_on_a_image.py to inference your images.
python tools/inference_on_a_image.py \
-c tools/GroundingDINO_SwinT_OGC.py \
-p path/to/your/ckpt.pth \
-i ./figs/dog.jpeg \
-t "dog" \
-o output
Prompt | Official ckpt | COCO ckpt | 1.8M ckpt |
---|---|---|---|
dog | |||
cat |
Provided codes were adapted from:
@misc{Open Grounding Dino,
author = {Zuwei Long, Wei Li},
title = {Open Grounding Dino:The third party implementation of the paper Grounding DINO},
howpublished = {\url{https://github.com/longzw1997/Open-GroundingDino}},
year = {2023}
}
Feel free to contact we if you have any suggestions or questions. Bugs found are also welcome. Please create a pull request if you find any bugs or want to contribute code.