zhmiao / OpenLongTailRecognition-OLTR

Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)
BSD 3-Clause "New" or "Revised" License
841 stars 128 forks source link

Reproduce model results #17

Closed JasAva closed 5 years ago

JasAva commented 5 years ago

Thanks for the inspiring work and code :)

I'm having trouble to reproduce the results (plain model as well as final model on both datasets.) I have used the default settings without any alterations. Can you shed some insights on the results (perhaps this is caused by the hyper-parameters) and maybe if it is OK for you to provide the trained models for both stage1 and stage2?

The results I have reproduced are as following:

1) ImageNet-LT

Stage1(close-setting): Evaluation_accuracy_micro_top1: 0.204 Averaged F-measure: 0.160 Many_shot_top1: 0.405; Median_shot_top1: 0.099; Low_shot_top1: 0.006

Stage1(open-setting): Open-set Accuracy: 0.178 Evaluation_accuracy_micro_top1: 0.199 Averaged F-measure: 0.291 Many_shot_top1: 0.396; Median_shot_top1: 0.096; Low_shot_top1: 0.006

Stage2(close-setting): Evaluation_accuracy_micro_top1: 0.339 Averaged F-measure: 0.322 Many_shot_top1: 0.411; Median_shot_top1: 0.330; Low_shot_top1: 0.167

Stage2(open-setting): Open-set Accuracy: 0.245 Evaluation_accuracy_micro_top1: 0.327 Averaged F-measure: 0.455 Many_shot_top1: 0.398; Median_shot_top1: 0.318; Low_shot_top1: 0.159

2) Places-LT

Stage1(close-setting): Evaluation_accuracy_micro_top1: 0.268 Averaged F-measure: 0.248 Many_shot_top1: 0.442; Median_shot_top1: 0.221; Low_shot_top1: 0.058

Stage1(open-setting): Open-set Accuracy: 0.018 Evaluation_accuracy_micro_top1: 0.267 Averaged F-measure: 0.373 Many_shot_top1: 0.441; Median_shot_top1: 0.219; Low_shot_top1: 0.057

Stage2(close-setting): Evaluation_accuracy_micro_top1: 0.349 Averaged F-measure: 0.338 Many_shot_top1: 0.387; Median_shot_top1: 0.355; Low_shot_top1: 0.263

Stage2(open-setting): Open-set Accuracy: 0.120 Evaluation_accuracy_micro_top1: 0.342 Averaged F-measure: 0.477 Many_shot_top1: 0.382; Median_shot_top1: 0.349; Low_shot_top1: 0.254

JasAva commented 5 years ago

And can you also give some insights on how the evaluation metrics in the code corresponding to the ones reported in the paper? And also got a little confused why overall is reported for close setting, F-measure is reported for open setting.

zhmiao commented 5 years ago

Hello @JasAva besides the randomness of each training session, I think the version of pytorch might also causing troubles sometimes. In addition, I am thinking maybe the learing rate we published is a little bit different than the ones we used for the experiments. Sometimes the numbers can be mixed. We are very sorry about this. About the F-measure, we follow this paper: https://arxiv.org/abs/1511.06233 , please check it out. Thank you very much.

JasAva commented 5 years ago

Hi @zhmiao , Thanks for answering, I also think this might caused by the learning rate. Moreover, can you provide the trained models for stage1 and stage2, I'd like to benchmark the reported results.

jchhuang commented 5 years ago

hi, @zhmiao I also have the same problem I cann't repeat your result by this version code, so may you provide you used lr of feature network and classifier network respectively? Thanks a lot.

zhmiao commented 5 years ago

@JasAva @jchhuang Yes. we will publish our pretrained models to maybe later this weekend or earlier next week. We will notify you as soon as they are published. Thanks.

jchhuang commented 5 years ago

@zhmiao thanks for your quickly reply. May you also provide the detailed hyper-parameters, I think many researcher also would like to repeat the experiments by theirselves. tks

jchhuang commented 5 years ago

@JasAva dear, have you reproduce the results as claimed in the paper, may you share some insights to me?

zhmiao commented 5 years ago

@JasAva @jchhuang We found some bugs in the current published code. It is somewhere in the MetaEmbeddingClassifier. It is caused by renaming the variables to be consistent with the paper during code releasing process. We will fix the bug asap. Thanks

zhmiao commented 5 years ago

@JasAva @jchhuang we posted a reimplemented imagenet-lt weights using current config, the numbers are very close to what we reported. We are reimplementing Places right now. Will keep you updated. Thanks

JasAva commented 5 years ago

@zhmiao Thanks for updating the models. Just curious, there seems no changes in the code itself (you mentioned there is a bug somewhere in the MetaEmbeddingClassifier?), are the reimplemented models are obtained using the current release?

jchhuang commented 5 years ago

@zhmiao Thanks for updating, however, the method of producing the results claimed in your paper is more appreciated other than just post a re-pretrain model weights, because peoples will doubt the performances of the re-pretrain model weights maybe benefits from a larger datasets other than the algorithm itself?

zhmiao commented 5 years ago

@JasAva Yes, we have gone through the code, it seems that there there was a bug in the evaluation functions instead of the classifier.

jchhuang commented 5 years ago

@zhmiao hi,the bug you mentioned in the evaluation functions is replace the>> and << as >>+ and <<= in the function of shot_acc().

zhmiao commented 4 years ago

@JasAva @jchhuang @drcege Hello! Sorry for the late reply! As described in https://github.com/zhmiao/OpenLongTailRecognition-OLTR/issues/50#issue-524159914 , we finally debugged the published code and current open set performance is:

============ Phase: test

Evaluation_accuracy_micro_top1: 0.361 Averaged F-measure: 0.501 Many_shot_accuracy_top1: 0.442 Median_shot_accuracy_top1: 0.352 Low_shot_accuracy_top1: 0.175

==========

This is higher than we reported in the paper. We updated some of the modules with clone() method, and set use_fc in the first stage to False. These changes will lead us to the proper results. Please have a try. Thank you very much again.

For Places, the current config won't work either. The reason why we could not get the reported results is that we forget that on the first stage, we actually did not freeze the weights. We only freeze the weights on the second stage. We will update the corresponding code as soon as possible.

zhmiao commented 4 years ago

@JasAva @jchhuang @drcege Hello, we have updated configuration files for Places. Currently, the reproduced results are a little better than reported. Please check out the updates. Thanks!