ICLR 2019 Reproducibility report: H-detach

dido1998 commented 5 years ago

Submission of ICLR 2019 reproducibility report for the paper: h-detach: Modifying the LSTM Gradient Towards Better Optimization Issue number: #53

reproducibility-org commented 5 years ago

Hi, please find below a review submitted by one of the reviewers:

Score: 7 Reviewer 2 comment : We shall refer to the authors of the reproducibility report as authors and the authors of the original paper as writers. Similarly, we shall refer to the reproducibility report as the report and the original ICLR submission as the paper for the rest of this document.

Summary: The authors re-ran all of the major experiments to prove the validity of the h-detach algorithm. The authors communicated with the writers to understand the problem in order to fairly reproduce the original work. Additionally, the authors provide a CUDA based implementation of the algorithm in order to improve its speed during training.

Problem Statement: The report clearly states the problem statement of the paper and provides a summary of the method used in the paper.

Code: The authors re-used writer’s original repository but made changes to it to run experiments with other values for hyperparameters and initial seeds. The report additionally provides a CUDA based implementation for the original work in order to speed up it's training time.

Communication with writers: The report outlines the communication with the writers in regards to the slow training speed of the original implementation. It highlights that the original implementation is slow due to the sequential nature of data intake as opposed to the vanilla version of LSTM. The writers relayed that this was done to ensure the correctness of the h-detach algorithm.

Hyperparameter Search: The report experimented with different seed values for the copying task and learning rate for Sequential MNIST task. But the authors did not perform/mention any hyperparameter sweep for the probability of h-detach itself, which is the core of the work.However, the report cites that writers did hyperparameter sweep for h-detach probability, which they did not mention in the paper. The authors state that they find this from writer's comment in the openreview forum.

Ablation Study: The report performed both the ablation studies mentioned in the paper and replicated the results. No additional ablation study was done in the report.

Discussion on results: The report clearly states that they were able to reproduce the original work.

Recommendations for reproducibility: The report recommends the writers to try out the h-detach algorithm on stacked LSTMs. The original work mentions this as part of the future direction and outlines different ways of doing the same. Additionally, the report recommends the writers to try the algorithm on bidirectional RNNs with LSTM cells.

Overall organization and clarity: This report does a good job of reproducing the major results of the paper. It would have been interesting to see the results of hyperparameter sweep for h-detach probability in the report. The report did not reproduce the results of the transfer copying task which evaluates the generalization capability of the h-detach algorithm.

Confidence : 4

reproducibility-org commented 5 years ago

Hi, please find below a review submitted by one of the reviewers:

Score: 7 Reviewer 1 comment : - The report's authors have understood the problem + confirm reproducibility of the original paper. The original paper presents an algorithmic contribution to avoid the vanishing gradient problem in recurrent neural networks. The report is clear, and sufficiently detailed. Source code is well-documented, including the README. I could not run the source code due to some error with CUDA. As drawback, reproducibility results are a little bit limited, the authors could have explored more random inits and hyperparameters at almost no extra cost (e.g., Fig 1 uses only 2 random inits).

More details:

Problem statement: well-understood.

Code: original code used, an extension in CUDA is also proposed to speed-up the algorithm and make it competitive against vanilla LSTM implementation in pytorch. Well-documented. Missing version of CUDA, and tensorboardx in requirements.

Communication with original authors: yes, sufficient.

Hyperparameter Search: no additional hyperparameter search, replication of results using same hyperparameters.

Ablation Study: yes, 2 ablations, already studied in the original paper (gradient clipping, and

Discussion on results: The authors have reproduced most of the results in the original paper. The range of parameters tested is relatively small (only 2 random init, 2 learning rate, etc...). More particularly,

* Fig 1: more than 2 seeds is required. It is hard to get very convincing conclusions regarding the stability and speed of convergence of the alg with just 2 random inits, given the high variance between the two seeds for both methods. I encourage the authors to replot Fig.1 including 5 seeds at least. Ideally, I would plot the average of 100 seeds with +- std deviations across epochs.
* Fig 2: satisfactory
* Fig 5: it would be nice to also include vanilla LSTM baseline in these plots.

Recommendations for reproducibility: not many, only recommends extension to other architectures/types of RNNs.

Overall organization and clarity: Good, overall clear. Some grammar/typos (e.g., last paragrah in page 4.) Confidence : 4

reproducibility-org commented 5 years ago

Hi, please find below a review submitted by one of the reviewers:

Score: 8 Reviewer 3 comment :

Problem statement The authors of the report conveyed good and clear understanding of the problem, though they have provided a lot of details unnecessarily. For example, repeating all the LSTM equations and their derivations in their report. This is not the goal of the reproducibility challenge, as one can read them directly from the paper. The goal, however, is to reproduce the results based on the experiments’ details provided by the authors in the paper and verify that the baseline results are (maybe) the best that we can get from these baselines.
Code The authors used the original code provided with the paper and modified the LSTM implementation that was done by hand by the authors to use the CuDNNLSTM implementation to increase the speed of the experiments.
Communication with original authors Done over OpenReview, and they discussed it in detail in the report.
Hyperparameter Search Nothing beyond what was tested in the original paper.
Ablation Study Done the same studies that were done in the original paper.
Discussion on results The authors of this report have replicated the results from the paper. Therefore, a detailed discussion is not required.
Recommendations for reproducibility The authors mentioned that all the details were included in the original paper. The only recommendation/observation that they made was to use the CUDA implementation of LSTM which gave similar results to the authors’ implementation in the original paper.
Overall organization and clarity The report, overall, is clear and well-written. My only comment on it (or for future reports) is to reduce the amount of details that the original paper already has. Nothing wrong with the authors providing their own explanation of the problem/methodology, but to just repeat the same equations from the same paper is, in my opinion, a waste of space that could be utilized for more experiments.

Confidence : 5

reproducibility-org commented 5 years ago

Meta Reviewer Decision: Accept

reproducibility-challenge / iclr_2019

ICLR 2019 Reproducibility report: H-detach #148