reproducibility-challenge / iclr_2019

ICLR Reproducibility Challenge 2019
https://reproducibility-challenge.github.io/iclr_2019/
219 stars 40 forks source link

ICLR 2019 Reproducibility report: H-detach #148

Closed dido1998 closed 5 years ago

dido1998 commented 5 years ago

Submission of ICLR 2019 reproducibility report for the paper: h-detach: Modifying the LSTM Gradient Towards Better Optimization Issue number: #53

reproducibility-org commented 5 years ago

Hi, please find below a review submitted by one of the reviewers:

Score: 7 Reviewer 2 comment : We shall refer to the authors of the reproducibility report as authors and the authors of the original paper as writers. Similarly, we shall refer to the reproducibility report as the report and the original ICLR submission as the paper for the rest of this document.

Summary: The authors re-ran all of the major experiments to prove the validity of the h-detach algorithm. The authors communicated with the writers to understand the problem in order to fairly reproduce the original work. Additionally, the authors provide a CUDA based implementation of the algorithm in order to improve its speed during training.

Problem Statement: The report clearly states the problem statement of the paper and provides a summary of the method used in the paper.

Code: The authors re-used writer’s original repository but made changes to it to run experiments with other values for hyperparameters and initial seeds. The report additionally provides a CUDA based implementation for the original work in order to speed up it's training time.

Communication with writers: The report outlines the communication with the writers in regards to the slow training speed of the original implementation. It highlights that the original implementation is slow due to the sequential nature of data intake as opposed to the vanilla version of LSTM. The writers relayed that this was done to ensure the correctness of the h-detach algorithm.

Hyperparameter Search: The report experimented with different seed values for the copying task and learning rate for Sequential MNIST task. But the authors did not perform/mention any hyperparameter sweep for the probability of h-detach itself, which is the core of the work.However, the report cites that writers did hyperparameter sweep for h-detach probability, which they did not mention in the paper. The authors state that they find this from writer's comment in the openreview forum.

Ablation Study: The report performed both the ablation studies mentioned in the paper and replicated the results. No additional ablation study was done in the report.

Discussion on results: The report clearly states that they were able to reproduce the original work.

Recommendations for reproducibility: The report recommends the writers to try out the h-detach algorithm on stacked LSTMs. The original work mentions this as part of the future direction and outlines different ways of doing the same. Additionally, the report recommends the writers to try the algorithm on bidirectional RNNs with LSTM cells.

Overall organization and clarity: This report does a good job of reproducing the major results of the paper. It would have been interesting to see the results of hyperparameter sweep for h-detach probability in the report. The report did not reproduce the results of the transfer copying task which evaluates the generalization capability of the h-detach algorithm.

Confidence : 4

reproducibility-org commented 5 years ago

Hi, please find below a review submitted by one of the reviewers:

Score: 7 Reviewer 1 comment : - The report's authors have understood the problem + confirm reproducibility of the original paper. The original paper presents an algorithmic contribution to avoid the vanishing gradient problem in recurrent neural networks. The report is clear, and sufficiently detailed. Source code is well-documented, including the README. I could not run the source code due to some error with CUDA. As drawback, reproducibility results are a little bit limited, the authors could have explored more random inits and hyperparameters at almost no extra cost (e.g., Fig 1 uses only 2 random inits).

More details:

Problem statement: well-understood.

Code: original code used, an extension in CUDA is also proposed to speed-up the algorithm and make it competitive against vanilla LSTM implementation in pytorch. Well-documented. Missing version of CUDA, and tensorboardx in requirements.

Communication with original authors: yes, sufficient.

Hyperparameter Search: no additional hyperparameter search, replication of results using same hyperparameters.

Ablation Study: yes, 2 ablations, already studied in the original paper (gradient clipping, and

Discussion on results: The authors have reproduced most of the results in the original paper. The range of parameters tested is relatively small (only 2 random init, 2 learning rate, etc...). More particularly,

* Fig 1: more than 2 seeds is required. It is hard to get very convincing conclusions regarding the stability and speed of convergence of the alg with just 2 random inits, given the high variance between the two seeds for both methods. I encourage the authors to replot Fig.1 including 5 seeds at least. Ideally, I would plot the average of 100 seeds with +- std deviations across epochs.
* Fig 2: satisfactory
* Fig 5: it would be nice to also include vanilla LSTM baseline in these plots.

Recommendations for reproducibility: not many, only recommends extension to other architectures/types of RNNs.

Overall organization and clarity: Good, overall clear. Some grammar/typos (e.g., last paragrah in page 4.) Confidence : 4

reproducibility-org commented 5 years ago

Hi, please find below a review submitted by one of the reviewers:

Score: 8 Reviewer 3 comment :

reproducibility-org commented 5 years ago

Meta Reviewer Decision: Accept