Closed bradezard131 closed 3 years ago
Hi, can you show me the experiment log?
Sure, logs are in the below Gist. I've got 50.4 using the unmodified config, but the performance was much lower with the standard 5 scales only.
https://gist.github.com/bradezard131/85e12a4eb21552bfb21706d276c8586f
It seems your experiment used 1 GPU with 4 images, while the config is designed for 4 GPUs with 1 image per device. For one GPU, I suggest following modifies:
SOLVER
STEPS: (140000, 200000)
MAX_ITER: 200000
IMS_PER_BATCH: 1
WSL:
ITER_SIZE: 4
We prefer to use one image per GPU and accumulate the gradient. Although the codes can use multiple images per GPU, we have not tested it. I am not certain whether this is the reason. I will also try the five scales [480, 576, 688, 864, 1200] in this experiment and check the results.
oicr_r18-ws_voc2007_2020-07-05_01-47-11 train_wsl.sh --cfg configs_voc_2007_oicr_R-18-WS-C5_1x.yaml OUTPUT_DIR experiments_oicr_r18-ws_voc2007_2020-07-05_01-47-11 2020-07-05_01-47-11.log The caffe2 code with five scales [480, 576, 688, 864, 1200] got about 52 mAP (see line 5615 in uploaded file). So I believe the scale setting is not the main problem in your issue.
I will try your config now with accumulated gradients rather than true minibatches. Otherwise I will see if I can get access to 4 GPUs to reproduce.
I also see you have a new paper coming out soon in NeurIPS, I look forward to it.
The first log uses our scale setting and has about 50.6 mAP: oicr_WSR_18_DC5_VOC07_2020-10-14_01-22-34.txt The second log uses the five scale setting ([480, 576, 688, 864, 1200]) for both training and testing, and also has about 50.3 mAP: oicr_WSR_18_DC5_VOC07_2020-10-14_13-35-32.txt
Yes, I ran with one GPU and accumulated gradients and got ~49.5. Whilst disappointingly lower it's close enough that it makes sense. Thanks for the help :)
I just tried running your ResNet18 WS model on VOC07 (PascalVOC-Detection/oicr_WSR_18_DC5_1x.yaml). I changed the scales in the config to match the ones in the paper (i.e. the standard [480, 576, 688, 864, 1200]) for both training and testing. The results I got were only ~42 mAP however your paper reports ~51 mAP. This is quite a significant discrepancy. Any suggestions as to how one might reproduce the published results?