Closed lxtGH closed 1 year ago
Hi @lxtGH,
Thank you for your interest in our work. We train MAVL model on approx. 1.3 M aligned image-text pairs. The dataset is the same as used in MDETR. The training took almost 3 days on 32 V100 GPUs. Following this the model is evaluated on diverse datasets (Table 1 & 2) for class-agnostic object detection without any finetuning.
Thanks for your reply.
HI! @mmaaz60 Thanks for your opensource this project. I wonder how long the training procedure takes? What kind of device you yse?