zhmiao / OpenLongTailRecognition-OLTR

Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)
BSD 3-Clause "New" or "Revised" License
839 stars 128 forks source link

Unable to reproduce the results of the paper #49

Closed GuohongLi closed 4 years ago

GuohongLi commented 4 years ago

For ImageNet_LT,I just use default config in the code, but cannot reproduce the results in paper Table3(a).

  1. For stage1, my result is(some last logs when training complete): Epoch: [30/30] Step: 440 Minibatch_loss_performance: 2.833 Minibatch_accuracy_micro: 0.438 Epoch: [30/30] Step: 450 Minibatch_loss_performance: 2.886 Minibatch_accuracy_micro: 0.379 Phase: val 100%|██████████| 79/79 [01:40<00:00, 1.34it/s] Phase: val Evaluation_accuracy_micro_top1: 0.220 Averaged F-measure: 0.175 Many_shot_accuracy_top1: 0.427 Median_shot_accuracy_top1: 0.113 Low_shot_accuracy_top1: 0.007 Training Complete. Best validation accuracy is 0.220 at epoch 30

Few/Low shot acc 0.7% is better with 0.4% Plain model in Table3(a) .


[Below is IMPORTANT!!!!!] 2.However for stage2, my result is(some last logs when training complete): Epoch: [60/60] Step: 440 Minibatch_loss_feature: 0.569 Minibatch_loss_performance: 2.938 Minibatch_accuracy_micro: 0.566 Epoch: [60/60] Step: 450 Minibatch_loss_feature: 0.567 Minibatch_loss_performance: 2.845 Minibatch_accuracy_micro: 0.539 Phase: val 100%|██████████| 79/79 [01:34<00:00, 1.02it/s] Phase: val Evaluation_accuracy_micro_top1: 0.340 Averaged F-measure: 0.324 Many_shot_accuracy_top1: 0.401 Median_shot_accuracy_top1: 0.334 Low_shot_accuracy_top1: 0.197 Training Complete. Best validation accuracy is 0.341 at epoch 48

However Many, Median and Few/Low shot acc are 40.1%, 33.4% and 19.7%, which are a little diff with 43.2%, 35.1% and 18.5% in "Ours" model in Table3(a) . And I retrained for several times, the Many-shot acc always some lower than 43.2%.


Are there any tricks not released?

zhmiao commented 4 years ago

@GuohongLi Thank you very much for asking and sorry for the late reply. As replied in your last issue (https://github.com/zhmiao/OpenLongTailRecognition-OLTR/issues/50#issue-524159914), we finally debugged the published code and current open set performance is:

============ Phase: test

Evaluation_accuracy_micro_top1: 0.361 Averaged F-measure: 0.501 Many_shot_accuracy_top1: 0.442 Median_shot_accuracy_top1: 0.352 Low_shot_accuracy_top1: 0.175

==========

This is higher than we reported in the paper. We updated some of the modules with clone() method, and set use_fc in the first stage to False. These changes will lead us to the proper results. Please have a try. Thank you very much again.

For Places, the current config won't work either. The reason why we could not get the reported results is that we forget that on the first stage, we actually did not freeze the weights. We only freeze the weights on the second stage. We will update the corresponding code as soon as possible.

zhmiao commented 4 years ago

@GuohongLi Hello, we just updated new configuration files for places, and the newly implemented results are a little better than reported. Please check out the updates. Thanks!