linhuixiao / HiVG

[ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.
https://github.com/linhuixiao/HiVG
Apache License 2.0
30 stars 3 forks source link

实验结果看起来很好,想请教一下论文中表1的实验设置 #1

Open Mr-Bigworth opened 6 months ago

Mr-Bigworth commented 6 months ago

您好,想请教一下Table1中Fine-tuning w. box-level dataset-mixed open-set detection pre-trained model / multi-task mix-supervised pre-trained model的实验设置是指什么样的呢

linhuixiao commented 6 months ago

@Mr-Bigworth

Hi, the experiments section of the paper have an explanation of this. Specifically, the training set of RefCOCO/+/g/Referit/Flickr datasets is mixed and goes through an intermediate pre-training as a grounding task, then followed by fine-tuning on each dataset. This setting has been used in multiple previous works.

Mr-Bigworth commented 6 months ago

@Mr-Bigworth

Hi, the experiments section of the paper have an explanation of this. Specifically, the training set of RefCOCO/+/g/Referit/Flickr datasets is mixed and goes through an intermediate pre-training as a grounding task, then followed by fine-tuning on each dataset. This setting has been used in multiple previous works.

Thanks!