mmaaz60 / mvits_for_class_agnostic_od

[ECCV'22] Official repository of paper titled "Class-agnostic Object Detection with Multi-modal Transformer".
MIT License
299 stars 25 forks source link

Issue about the training code #24

Closed HarborYuan closed 1 year ago

HarborYuan commented 1 year ago

Hi @mmaaz60 ,

Thanks for your great work.

I am trying the training code training/mdef_detr/README.md and I use the following configs (4x8 = 32 gpus):

--dataset_config configs/pretrain.json --ema --epochs 20 --lr_drop 16

However, I got the following result:

Evaluating
['Dataset', 'Model Name', 'Text Query', 'Avg. Boxes per Image', 'AP@50', 'Recall@50', 'Precission@50']
['kitti', 'mdef_detr', 'all_small_objects.pkl', 50.0, 15.53, 51.45, 7.13]
['kitti', 'mdef_detr', 'all_objects.pkl', 50.0, 30.51, 56.74, 7.87]
['kitti', 'mdef_detr', 'all_visible_entities_and_objects.pkl', 50.0, 39.3, 60.93, 8.45]
['kitti', 'mdef_detr', 'combined.pkl', 50.0, 35.22, 58.27, 8.08]
['kitti', 'mdef_detr', 'all_entities.pkl', 50.0, 30.94, 54.86, 7.61]
['kitti', 'mdef_detr', 'all_obscure_entities_and_objects.pkl', 50.0, 32.98, 60.47, 8.38]
['voc2007', 'mdef_detr', 'all_objects.pkl', 50.0, 47.53, 83.02, 5.02]
['voc2007', 'mdef_detr', 'all_visible_entities_and_objects.pkl', 50.0, 58.92, 88.53, 5.36]
['voc2007', 'mdef_detr', 'combined.pkl', 50.0, 63.41, 90.18, 5.45]
['voc2007', 'mdef_detr', 'all_entities.pkl', 50.0, 61.22, 85.7, 5.18]
['voc2007', 'mdef_detr', 'all_obscure_entities_and_objects.pkl', 50.0, 47.55, 84.29, 5.1]
['coco', 'mdef_detr', 'all_objects.pkl', 50.0, 28.61, 53.99, 8.02]
['coco', 'mdef_detr', 'all_visible_entities_and_objects.pkl', 50.0, 33.68, 58.51, 8.69]
['coco', 'mdef_detr', 'combined.pkl', 50.0, 38.46, 61.07, 9.07]
['coco', 'mdef_detr', 'all_entities.pkl', 50.0, 30.22, 50.34, 7.48]
['coco', 'mdef_detr', 'all_obscure_entities_and_objects.pkl', 50.0, 26.8, 53.57, 7.96]

It seems slightly difference compared to the paper (Tab.1). Is there something I did wrong?

Thanks agian.

mmaaz60 commented 1 year ago

Hi @HarborYuan,

Thank you for your interest in our work. Could you please show me your pretrain.json? Thanks

HarborYuan commented 1 year ago

Hi @mmaaz60 ,

Thanks for your reply.

I did not modify the pretrain.json, it looks like:

{
    "combine_datasets": ["flickr", "mixed"],
    "combine_datasets_val": ["gqa", "flickr", "refexp"],
    "coco_path": "data/coco",
    "vg_img_path": "data/GQA/images",
    "flickr_img_path": "data/fliker_30k",
    "refexp_ann_path": "data/OpenSource_Filter_ORE",
    "flickr_ann_path": "data/OpenSource_Filter_ORE",
    "gqa_ann_path": "data/OpenSource_Filter_ORE",
    "refexp_dataset_name": "all",
    "GT_type": "separate",
    "flickr_dataset_path": "data/fliker_30k/flickr30k_entities/Annotations"
}
mmaaz60 commented 1 year ago

Thanks @HarborYuan for sharing config. In order to get the results reported in the paper you have to replace OpenSource_Filter_ORE with OpenSource.

Currently you are training the model on filtered dataset that we constructed for ORE in Table 4 of our paper. This filtered dataset is constructed by removing all captions from the dataset listing any of the 60 unknown categories evaluated in ORE.

I have updated the pretrain.json in the training repo and I apologies for the inconvenience caused. Thank You.