Open lcalem opened 5 years ago
Hi, please find below a review submitted by one of the reviewers:
Score: 6 Reviewer 1 comment : This work tries to reproduce soft-Q learning with MI regularization and compare results with entropy regularization. Although the results in reproduction looks much worse than the original paper, the reproduction details are stated clear in the report. A major problem is in the report, which is less-organized and contains uninformative paragraphs. For example, (1) you don't need to introduce the importance of reproduction everywhere, and (2) I would suggest to include reproduction summarization in abstract, instead of introducing the challenge. Hyper-parameters are less-searched either. Even though, I would rate a borderline score as an encouragement to participate this reproduction challenge. Confidence : 3
Hi, please find below a review submitted by one of the reviewers:
Score: 5 Reviewer 2 comment : Context: choosing a proper baseline proved difficult and time consuming, and the report authors ended up implementing everything from scratch. After clarifying all the missing pieces of information and fixing some numerical stability problems, the authors did not have the time to launch the proper experiments to reproduce either the tabular setting or the Atari one.
Conclusion: given that the needed experiments could not be conducted due to lack of time, little conclusions about the reproducibility of the original article can be drawn.
Reviewers are asked to go through a list of items to evaluate the report. Here I include my list with some comments that may help understand how I perceived the information in the report as well as its specific strengths and weaknesses:
Problem statement. The reproducibility report frames correctly the problem targeted in the original article. The report authors, however, do not provide much information about their understanding of the proposed approach in the original article; taking into account that they implemented the code from scratch, it would seem reasonable to show good understanding of the approach.
Code: the reproducibility study did not reuse code from authors but implemented everything from scratch.
Communication with original authors: there was communication with the authors (not through openreview) in the following regards:
Hyperparameter Search: there was no hyperparameter sweep in this reproducibility study; the experiments sticked to the original hyperparameter values.
Ablation Study: there is no ablation study. The report could have included an ablation study with the scaling factor beta that regulates the trade off between reward and entropy maximization.
Discussion on results. The reproducibility report discusses to some degree the state of reproducibility of the original paper, but did not manage to finish in time all the experiment needed to assess such reproducibility.
Recommendations for reproducibility. The report points to missing pieces of information, but provide no further recommendations for reproducibility.
Overall organization and clarity:
Hi, please find below a review submitted by one of the reviewers:
Score: 7
Reviewer 3 comment : This work tries to reproduce the paper "Soft Q-learning with mutual information regularization" and is able to try out some of the experiments of the paper. I feel it's a good initial work and future work could build upon the submitted report to test the reproducibility of the original paper. However, this work is not complete in itself and I feel there are certain parts where the authors of the report could improve upon to make this submission more useful to the community in general.
The authors could describe the paper and the associated reproducible aspects of it in the abstract as opposed to describing the challenge. They do a good job in introducing the problem and providing a brief summary of all the key concepts and theoretical contributions of the paper (including the blog post). The effort of the authors to write the code from scratch is absolutely commendable. It would be great if the authors could have communicated with the authors via the OpenReview forum.
I feel the authors could improve upon the hyperparameter search done. Although I feel time was a considerable bottleneck, it would have been helpful to observe how the MIRL performance varies with different hyperparameter values. Had the authors shown such a dependence and commented on possible reasons for the same, it would have given better insights into the reproducibility aspect of the algorithm as well as its robustness in other applications. The authors could have also done an ablation study by keeping beta constant. This could have helped them study the performance without running into numerical overflows.
The authors describe various roadblocks faced in the result reproduction pipeline. From the report, I could gather that the results could not be reproduced and the authors believe the major differences lie in hyperparameter values and minor implementation tweaks. It would be more helpful if they could layout a formal set of guidelines or recommendations for the authors of the original paper to improve reproducibility.
Overall, I would like to appreciate the efforts of the authors in participating and implementing the paper. However, the report could have been more organized to clarify their contributions and highlight necessary tweaks to aid future work.
Confidence : 3
47