Closed HarborYuan closed 1 year ago
Hi @HarborYuan,
Thank you for your interest in our work. Could you please show me your pretrain.json? Thanks
Hi @mmaaz60 ,
Thanks for your reply.
I did not modify the pretrain.json, it looks like:
{
"combine_datasets": ["flickr", "mixed"],
"combine_datasets_val": ["gqa", "flickr", "refexp"],
"coco_path": "data/coco",
"vg_img_path": "data/GQA/images",
"flickr_img_path": "data/fliker_30k",
"refexp_ann_path": "data/OpenSource_Filter_ORE",
"flickr_ann_path": "data/OpenSource_Filter_ORE",
"gqa_ann_path": "data/OpenSource_Filter_ORE",
"refexp_dataset_name": "all",
"GT_type": "separate",
"flickr_dataset_path": "data/fliker_30k/flickr30k_entities/Annotations"
}
Thanks @HarborYuan for sharing config. In order to get the results reported in the paper you have to replace OpenSource_Filter_ORE
with OpenSource
.
Currently you are training the model on filtered dataset that we constructed for ORE
in Table 4 of our paper. This filtered dataset is constructed by removing all captions from the dataset listing any of the 60 unknown categories evaluated in ORE.
I have updated the pretrain.json in the training repo and I apologies for the inconvenience caused. Thank You.
Hi @mmaaz60 ,
Thanks for your great work.
I am trying the training code training/mdef_detr/README.md and I use the following configs (4x8 = 32 gpus):
However, I got the following result:
It seems slightly difference compared to the paper (Tab.1). Is there something I did wrong?
Thanks agian.