Results don't match paper

shenyunhang / DRN-WSOD-pytorch

Enabling Deep Residual Networks for Weakly Supervised Object Detection

https://github.com/shenyunhang/DRN-WSOD-pytorch/tree/DRN-WSOD/projects/WSL

Apache License 2.0

50 stars 10 forks source link

Results don't match paper #2

Closed bradezard131 closed 3 years ago

bradezard131 commented 3 years ago

I just tried running your ResNet18 WS model on VOC07 (PascalVOC-Detection/oicr_WSR_18_DC5_1x.yaml). I changed the scales in the config to match the ones in the paper (i.e. the standard [480, 576, 688, 864, 1200]) for both training and testing. The results I got were only ~42 mAP however your paper reports ~51 mAP. This is quite a significant discrepancy. Any suggestions as to how one might reproduce the published results?

shenyunhang commented 3 years ago

Hi, can you show me the experiment log?

bradezard131 commented 3 years ago

Sure, logs are in the below Gist. I've got 50.4 using the unmodified config, but the performance was much lower with the standard 5 scales only.

https://gist.github.com/bradezard131/85e12a4eb21552bfb21706d276c8586f

shenyunhang commented 3 years ago

It seems your experiment used 1 GPU with 4 images, while the config is designed for 4 GPUs with 1 image per device. For one GPU, I suggest following modifies:

SOLVER
  STEPS: (140000, 200000)
  MAX_ITER: 200000
  IMS_PER_BATCH: 1
WSL:
  ITER_SIZE: 4

We prefer to use one image per GPU and accumulate the gradient. Although the codes can use multiple images per GPU, we have not tested it. I am not certain whether this is the reason. I will also try the five scales [480, 576, 688, 864, 1200] in this experiment and check the results.

shenyunhang commented 3 years ago

oicr_r18-ws_voc2007_2020-07-05_01-47-11 train_wsl.sh --cfg configs_voc_2007_oicr_R-18-WS-C5_1x.yaml OUTPUT_DIR experiments_oicr_r18-ws_voc2007_2020-07-05_01-47-11 2020-07-05_01-47-11.log The caffe2 code with five scales [480, 576, 688, 864, 1200] got about 52 mAP (see line 5615 in uploaded file). So I believe the scale setting is not the main problem in your issue.

bradezard131 commented 3 years ago

I will try your config now with accumulated gradients rather than true minibatches. Otherwise I will see if I can get access to 4 GPUs to reproduce.

I also see you have a new paper coming out soon in NeurIPS, I look forward to it.

shenyunhang commented 3 years ago

The first log uses our scale setting and has about 50.6 mAP: oicr_WSR_18_DC5_VOC07_2020-10-14_01-22-34.txt The second log uses the five scale setting ([480, 576, 688, 864, 1200]) for both training and testing, and also has about 50.3 mAP: oicr_WSR_18_DC5_VOC07_2020-10-14_13-35-32.txt

bradezard131 commented 3 years ago

Yes, I ran with one GPU and accumulated gradients and got ~49.5. Whilst disappointingly lower it's close enough that it makes sense. Thanks for the help :)