Which config file should I use when doing pre-training and fine-tuning on each task to reproduce the paper results?

yangapku commented 4 years ago

Hi. I have noticed that there are several config files in cfg/pretrain, cfg/vqa and cfg/refcoco (like there are 3 base-model configs base_e2e_16x16G_fp16.yaml, base_prec_4x16G_fp32.yaml, base_prec_withouttextonly_4x16G_fp32.yaml existing in cfg/pretrain) Can you provide more details about the differences of these configs? If I want to reproduce the paper results, which configs among them should I use? Thank you!

jackroos commented 4 years ago

Hi. I have noticed that there are several config files in cfg/pretrain, cfg/vqa and cfg/refcoco (like there are 3 base-model configs base_e2e_16x16G_fp16.yaml, base_prec_4x16G_fp32.yaml, base_prec_withouttextonly_4x16G_fp32.yaml existing in cfg/pretrain) Can you provide more details about the differences of these configs? If I want to reproduce the paper results, which configs among them should I use? Thank you!

The format of config name is <MODEL/SETTING>_<NUM_GPUxGPU_MEM>_<fp16/32>.
In pretraining configs, 'e2e' means Fast RCNN is tuned during pre-training, while 'prec' means Fast RCNN is fixed and precomputed (which is corresponding to setting(d) in table4 of paper), besides, base_prec_withouttextonly_4x16G_fp32.yaml is corresponding to setting(c).
In finetuning experiments, you should download pre-trained models (See PREPARE_PRETRAINED_MODELS.md), and then use corresponding configs. The pre-trained model path is specified in NETWORK.PARTIAL_PRETRAIN of config yaml.

Thanks!