Submission for issue #63

Hi, please find below a review submitted by one of the reviewers:

Score: 5 Reviewer 3 comment : The report starts with a good summary of the main contributions of the original paper. The report also clearly states the goal of the reproducibility work, the motivation for focusing on certain experiments. I particularly liked Sec.3.1, which clearly explains the “uncertainties and differences”, corresponding to choices that had to be made during the reproducibility study. This is quite enlightening, including finding a sign error in an equation in the original ICLR submission.

The experimental results provided in the reproducibility study are able to confirm some, but not all of the results in the paper. Overall, the reproducibility study would be strengthen by doing more experiments (evaluation is done for only 1 seed); given a bit more time, this should be feasible since the Mujoco domains are not that slow to train (compared to other deep RL benchmarks).

The reproducibility report would also be strengthen by a more in-depth discussion of the findings. For example, the drop in performance at 400K steps in the Ant domain seems surprising; this is dismissed as “could be mitigated with early stopping”, yet the original study does not report this and does not stop earlier. There is also speculation that a change how terminal states are handled might explain other differences. This would need to be resolved before the report is ready for publication. This seems to be a particularly important point because the use of absorvbing states is a key feature of the original paper.

Minor point: The y-axis in Fig.1 and Fig.3a of the reproducibility report are not labelled. In the original work, the caption gives the definition of this quantity.

The reproducibility report includes in its appendix a conversation with one of the authors of the ICLR manuscript. While this explains some of the steps taken to ensure thorough reproducibility, the conversation should not be included in any paper. It can be referenced as a “Personal communication” in the list of references.

Confidence : 4

Hi, please find below a review submitted by one of the reviewers:

Score: 7 Reviewer 1 comment : This work tries to replicate the paper "Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning" and does a commendable job in replicating some of the key experiments of the paper. The authors present a comprehensive report and also demonstrate an extensive discussion with the original authors. I would like to heartily appreciate the efforts taken by both the original authors and re-science authors to improve the reproducibility of the paper.

The authors present a very good description of the problem and provide a noteworthy explanation of the setting. This shows that they had a clear understanding of the motivation and the objectives of the original paper. The submitted report is a very good read for someone trying to understand the paper in a short time.

The effort made by the authors in implementing the code from scratch is commendable. Given they started from just the pseudocode and ended up implementing the entire strategy is a great achievement given the time frame. They also pointed out certain typos and/or misrepresentations in the original paper, which is a great contribution towards improving reproducibility. However, the experiments lacked a clear hyperparameter search which would have been helpful to judge the robustness of the algorithm to the choice of hyperparameter. I guess the time constraints would have limited the authors from doing so. However, it would be great to see it in future versions of this work.

The authors ended up replicating the algorithm performance on 2 of 4 MuJoCo environments and more importantly, figured out the problem with the other 2. I hope they can address them in the future and make the report even more comprehensive. They also show that the reward function claim made by the authors is not valid and thus the corresponding figure is not reproducible. It would help if the authors could discuss it further in the report.

Although the authors present a lot of interesting results and description of their efforts, they missed adding some pointers to improve reproducibility. I would request them to do so in the final report as it would help the original authors as well as other interested researchers. Given this report is a submission to the reproducibility challenge, recommendations for improving reproducibility would be one of the most important takeaways.

Overall, I feel it's a great effort by the authors and I hope they can continue in similar lines to complete the report. A couple of minor comments - it would be helpful to follow the plots if the colors are consistent from the original paper. Also, adding screenshots might not be the best way to showcase the conversation. Rather it would help to put it in a more structured dialog fashion, which I would request the authors to do for the final report version. Confidence : 3

reproducibility-challenge / iclr_2019

Submission for issue #63 #152