mmaaz60 / mvits_for_class_agnostic_od

[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".
MIT License
299 stars 25 forks source link

How long the training takes? #21

Closed lxtGH closed 1 year ago

lxtGH commented 1 year ago

HI! @mmaaz60 Thanks for your opensource this project. I wonder how long the training procedure takes? What kind of device you yse?

mmaaz60 commented 1 year ago

Hi @lxtGH,

Thank you for your interest in our work. We train MAVL model on approx. 1.3 M aligned image-text pairs. The dataset is the same as used in MDETR. The training took almost 3 days on 32 V100 GPUs. Following this the model is evaluated on diverse datasets (Table 1 & 2) for class-agnostic object detection without any finetuning.

lxtGH commented 1 year ago

Thanks for your reply.