Closed JasAva closed 5 years ago
And can you also give some insights on how the evaluation metrics in the code corresponding to the ones reported in the paper? And also got a little confused why overall is reported for close setting, F-measure is reported for open setting.
Hello @JasAva besides the randomness of each training session, I think the version of pytorch might also causing troubles sometimes. In addition, I am thinking maybe the learing rate we published is a little bit different than the ones we used for the experiments. Sometimes the numbers can be mixed. We are very sorry about this. About the F-measure, we follow this paper: https://arxiv.org/abs/1511.06233 , please check it out. Thank you very much.
Hi @zhmiao , Thanks for answering, I also think this might caused by the learning rate. Moreover, can you provide the trained models for stage1 and stage2, I'd like to benchmark the reported results.
hi, @zhmiao I also have the same problem I cann't repeat your result by this version code, so may you provide you used lr of feature network and classifier network respectively? Thanks a lot.
@JasAva @jchhuang Yes. we will publish our pretrained models to maybe later this weekend or earlier next week. We will notify you as soon as they are published. Thanks.
@zhmiao thanks for your quickly reply. May you also provide the detailed hyper-parameters, I think many researcher also would like to repeat the experiments by theirselves. tks
@JasAva dear, have you reproduce the results as claimed in the paper, may you share some insights to me?
@JasAva @jchhuang We found some bugs in the current published code. It is somewhere in the MetaEmbeddingClassifier. It is caused by renaming the variables to be consistent with the paper during code releasing process. We will fix the bug asap. Thanks
@JasAva @jchhuang we posted a reimplemented imagenet-lt weights using current config, the numbers are very close to what we reported. We are reimplementing Places right now. Will keep you updated. Thanks
@zhmiao Thanks for updating the models. Just curious, there seems no changes in the code itself (you mentioned there is a bug somewhere in the MetaEmbeddingClassifier?), are the reimplemented models are obtained using the current release?
@zhmiao Thanks for updating, however, the method of producing the results claimed in your paper is more appreciated other than just post a re-pretrain model weights, because peoples will doubt the performances of the re-pretrain model weights maybe benefits from a larger datasets other than the algorithm itself?
@JasAva Yes, we have gone through the code, it seems that there there was a bug in the evaluation functions instead of the classifier.
@zhmiao hi,the bug you mentioned in the evaluation functions is replace the>> and << as >>+ and <<= in the function of shot_acc().
@JasAva @jchhuang @drcege Hello! Sorry for the late reply! As described in https://github.com/zhmiao/OpenLongTailRecognition-OLTR/issues/50#issue-524159914 , we finally debugged the published code and current open set performance is:
============ Phase: test
Evaluation_accuracy_micro_top1: 0.361 Averaged F-measure: 0.501 Many_shot_accuracy_top1: 0.442 Median_shot_accuracy_top1: 0.352 Low_shot_accuracy_top1: 0.175
==========
This is higher than we reported in the paper. We updated some of the modules with clone() method, and set use_fc in the first stage to False. These changes will lead us to the proper results. Please have a try. Thank you very much again.
For Places, the current config won't work either. The reason why we could not get the reported results is that we forget that on the first stage, we actually did not freeze the weights. We only freeze the weights on the second stage. We will update the corresponding code as soon as possible.
@JasAva @jchhuang @drcege Hello, we have updated configuration files for Places. Currently, the reproduced results are a little better than reported. Please check out the updates. Thanks!
Thanks for the inspiring work and code :)
I'm having trouble to reproduce the results (plain model as well as final model on both datasets.) I have used the default settings without any alterations. Can you shed some insights on the results (perhaps this is caused by the hyper-parameters) and maybe if it is OK for you to provide the trained models for both stage1 and stage2?
The results I have reproduced are as following:
1) ImageNet-LT
Stage1(close-setting): Evaluation_accuracy_micro_top1: 0.204 Averaged F-measure: 0.160 Many_shot_top1: 0.405; Median_shot_top1: 0.099; Low_shot_top1: 0.006
Stage1(open-setting): Open-set Accuracy: 0.178 Evaluation_accuracy_micro_top1: 0.199 Averaged F-measure: 0.291 Many_shot_top1: 0.396; Median_shot_top1: 0.096; Low_shot_top1: 0.006
Stage2(close-setting): Evaluation_accuracy_micro_top1: 0.339 Averaged F-measure: 0.322 Many_shot_top1: 0.411; Median_shot_top1: 0.330; Low_shot_top1: 0.167
Stage2(open-setting): Open-set Accuracy: 0.245 Evaluation_accuracy_micro_top1: 0.327 Averaged F-measure: 0.455 Many_shot_top1: 0.398; Median_shot_top1: 0.318; Low_shot_top1: 0.159
2) Places-LT
Stage1(close-setting): Evaluation_accuracy_micro_top1: 0.268 Averaged F-measure: 0.248 Many_shot_top1: 0.442; Median_shot_top1: 0.221; Low_shot_top1: 0.058
Stage1(open-setting): Open-set Accuracy: 0.018 Evaluation_accuracy_micro_top1: 0.267 Averaged F-measure: 0.373 Many_shot_top1: 0.441; Median_shot_top1: 0.219; Low_shot_top1: 0.057
Stage2(close-setting): Evaluation_accuracy_micro_top1: 0.349 Averaged F-measure: 0.338 Many_shot_top1: 0.387; Median_shot_top1: 0.355; Low_shot_top1: 0.263
Stage2(open-setting): Open-set Accuracy: 0.120 Evaluation_accuracy_micro_top1: 0.342 Averaged F-measure: 0.477 Many_shot_top1: 0.382; Median_shot_top1: 0.349; Low_shot_top1: 0.254