wanghao9610 / OV-DINO

Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
https://wanghao9610.github.io/OV-DINO
Apache License 2.0
240 stars 13 forks source link

questions about implement and deploy details #10

Closed math-yyj closed 2 months ago

math-yyj commented 2 months ago

Nice work ! and I have two questions:

  1. How many GPUs needed for full-training ?
  2. Is there any guide for ONNX export ?

many thanks !

wanghao9610 commented 2 months ago

Thanks for your attention to our work. (1) OV-DINO is pre-trained on 32 A100 GPUs with batch size 128, then finetuned on 8 A100 GPUs with batch size 32. (2) Good suggestion, I will add it to the road map. I'm the only one coding, please stay tuned.

math-yyj commented 2 months ago

Thanks for your attention to our work. (1) OV-DINO is pre-trained on 32 A100 GPUs with batch size 128, then finetuned on 8 A100 GPUs with batch size 32. (2) Good suggestion, I will add it to the road map. I'm the only one coding, please stay tuned.

Thank you for your kindly reply.

And, how many hours did pretrain and finetune cost ?

wanghao9610 commented 2 months ago

OV-DINO1 pre-trained on O365 takes about 1 day, OV-DINO2 pre-trained on (O365, GoldG) takes about 3 days, and OV-DINO3 pre-trained on (O365, GoldG, CC1M‡) takes about 1 week, the fine-tuninging on COCO takes about 0.5 day. Note: The above training costs are not accurate exactly, just provided for clarification.