reproducibility-challenge / iclr_2019

ICLR Reproducibility Challenge 2019
https://reproducibility-challenge.github.io/iclr_2019/
219 stars 40 forks source link

Submission for #102 #149

Open mknbv opened 5 years ago

mknbv commented 5 years ago

Submission for #102

reproducibility-org commented 5 years ago

Hi, please find below a review submitted by one of the reviewers:

Score: 6 Reviewer 1 comment : Problem Statement

The problem statement is clearly written in Section 1: the Adam optimizer fails when some gradients have large magnitude but appear rarely.

Code

The code is open-sourced on GitHub.

Communication with the Original Author

The authors do not mention any communication with the authors in the paper, but a record of a discussion can be found on OpenReview.

Hyperparameter Search

The authors sometimes use different hyperparameters from the original authors.

Ablation Study

No ablation study is performed by the authors.

Discussion on Results

The authors briefly discuss about the reproduced results. A more detailed review with side-by-side plots might help the readers.

Recommendations for Reproducibility

The authors do not provide any particular recommendations to the original authors for improving reproducibility.

Overall Organization and Clarity

Overall, the paper is well-written and is easy to read. However, some parts of the paper is left for the reader to refer to the original paper.


Here are points that I was particularly impressed with:

  1. For the WGAN-GP task, the authors also try training both the discriminator and the generator, instead of fixing the generator like the original authors.

Here are some parts of the paper I wished for more information:

  1. The reason for excluding SGD when experimenting with MLP on MNIST.
  2. The smoothing method was used for MNIST plots (mentioned just before 4.2.1).
  3. The reason behind selecting different hyperparameters from authors: does it not converge? A table comparing your hyperparameters with the original hyperparameters would be nice.

Here are some minor fixes I recommend:

  1. In Section 4.2 (MNIST), there is a typo in the first sentence: Perceptrion -> Perceptron.
  2. Adding figures from the original paper might make it easier for readers to see how reproducible the original paper was.
  3. Reiterating the suggestions at the end of Section 4 in Section 6 (Conclusion) to end with possible future works.
  4. In Section 4.2.1, be consistent with scientific notation: change numbers written in 2e-4 format to $2 \times 10^{-4}$.

Thank you!

Score (1-10): 6 Confidence Score (1-5): 2 Confidence : 2

reproducibility-org commented 5 years ago

Hi, please find below a review submitted by one of the reviewers:

Score: 5 Reviewer 2 comment : The report authors do a commendable job in trying to replicate the paper "AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods". The associated report and the code base is well-written and well documented respectively. As a submission to the reproducibility challenge, the authors achieve a great amount of success in reproducing the original experiments but there are some concerns that remain. I believe the authors can address those concerns to make the report even more comprehensive and usable to the community.

The authors present a good description of the problem and exhibit a sound understanding of the problem setting through their report. They also do a great job in summarizing the setting and notations used in the original paper. However, I would like to point out that they have a slight typo in the Algorithm description (Line 7). The algorithms shifts the gradients by n-points, not just 1. I would request the authors to make necessary corrections in the report (I assume it's just a typo and not a mistake in their code as they use n as a hyperparameter in several experiments). The authors also use an additional normalization for v (Line 8) which I don't see in the description in the original paper. It would help the reader if the authors could provide a description of the need of this normalization. Is it there in the original code base and the authors don't mention this in the paper? Or is it something that the report authors introduce?

I would like to appreciate the authors' efforts to write the code from scratch in PyTorch (given that the original authors' code was in tensorflow). However, I feel that the authors could improve the readme of their github repository to add a "how to run" section that would help future researchers aiming to build on their work.

The authors show a detailed communication with the original authors on openreview forum but don't mention it in the report. It would help if they could add a link to the same in the report.

The hyperparameter search was not significantly conducted. This raises questions on the robustness of the algorithm. As a reproducibility report, I would expect it to have an extensive hyperparameter search section on replicated experiments before trying out newer experiments, which was missing. Also it is not explicitly clear which hyperparameters were chosen by the authors and which were used from the original implementation. It would help to add a table in the report with the 3 columns - Experiment, Hyperparameters used by original authors, Hyperparameters used in reproduciblity report. At the current stage, it is not very easy to judge if the choice of hyperparameters was done looking at the previous report or chosen by the authors from their experience.

I feel a major contribution of this work is their experiments in a more realistic WGAN-GP setting. I would like to appreciate the authors' efforts for the same. Added to this, they also replicated most of the experiments in the original paper (except the tiny-imagenet and CIFAR-10) which gives an insight into the reproducibility of the original paper. They also pointed out discrepancies between the paper and the algorithm's implementation, which I feel is a great contribution in the context of the challenge. It would be really helpful if the authors could add some recommendations to improve reproducibility in a structured way. Right now, it's slightly scattered and not an easy take-away from the report. My score reflects both the positives and negatives of the paper. If the authors can address some of the concerns raised (specifically related the algorithm description in the two reports), I would be happy to update my score to a higher level. Finally, I would like to appreciate the authors' efforts in replicating the paper to a significant extent such that the reporducibility of the original paper can be judged. Confidence : 4