SaeedSaadatnejad commented 5 years ago

104

reproducibility-org commented 5 years ago

Hi, please find below a review submitted by one of the reviewers:

Score: 8 Reviewer 1 comment : [Problem statement] The report very clearly states and understands the problem statement of the original paper: a new optimization strategy for the learning rate of the gradient descent algorithm.

[Code]

The code was produced from scratch and uploaded on GitHub. This may partially influence the divergence of results found by the authors of the report. For instance, the manual implementation of an optimization function for the CIFAR-10 experiments may not be trivial and could influence some of the results.
Rather than uploading a big zip file on GitHub, it would be nice to upload all the files (except data and log files to save space if needed). This would avoid having to download the archive locally and unzip it to see the code...
In any case, each experiment came with its readme file which is very appreciated.

[Communication with original authors]

The report does not mentions any communications with the original authors for testing reproducibility. However we can see from the the OpenReview forum that an anonymous user commented about the initial learning rate issue, which is very likely to be a person from this report ;)

[Hyperparameter Search]

No explicit details was given for optimal hyperparameter search. The simple reproduction of the original paper was aimed at and no extra effort is reported in terms of hyperparameter sweep.

[Ablation Study]

The report provides a linear regression baseline, which was not provided by the original paper. This helps to better analyse the results. Very good initiative!

[Discussion on results]

The report contains very detailed discussion on the state of reproducibility of the original paper. |> 4.1 depicts the issue with the initial learning rate (ILR). Two comments stand out: (1) the original paper claims that their method is independent of the ILR, but the plots in the original paper tells otherwise; and (2) this report observe a divergence for RKL2 and RKL3 for low ILR, which is not observed in the original paper. |> Section 4.2 also shows two important messages: (1) the performance (according to loss and accuracy) of the batched GD algorithm is better than the proposed method. ; and (2) the convergence time taken by the new approach is similar to traditional approaches.
One caveat in Section 4 is the fact that the results are not explicitly linked to the previously described experiment (ie: section 3.1, 3.2, or 3.3), except for Figure 3 and 4. Thus it is sometimes hard to understand which dataset and which neural network architecture was used to generate some results.
There are grammatical errors in a lot of sentences throughout Section 4.

[Recommendations for reproducibility]

The authors of the report mention a difficulty with the initial learning rate. This should be further investigated by the authors.
The authors of the original paper did not mention the architecture used for the VGG net so VGG-16 was arbitrarily chosen by the authors of the report.

[Overall organization and clarity]

Overall, the report has a good organization
a lot of typos: |> (1)Introduction > 2nd paragraph > "[...]. descriptions of the method!" [ hyperparameter and cast [...] on the hyperparameter]" |> Introduction > last line > "The link of this reputability challenge ... " |> (2)Method > 4th paragraph > sentence "Four functions of KL-divergence [...] is used [...]" should be rewritten: "four functions are used : KL-divergence, [...], making categories KL, RKL, ..., respectively." |> (3.3)VGG NET on CIFAR > 1st paragraph > second last line > "CIFA-10" |> (3.3) > first line of page 3 > missing a " . " after "The structure is depicted in figure 1" |> (3.3.1) > last sentence should start with a capital letter. |> Figure2 > subplot captions were copy/paste but not modified. especially for Fig2.c and Fig2.d . Not sure that the caption for Fig2.b is correct... |> quite a few sentences in the last paragraph of section 4.1 should be reformulated. Please make adjustments by asking pdf annotations to a native English speaker.

Score: 8 (accept) Still need to correct grammatical errors and reformulate some sentences for your final submission. Confidence : 4

reproducibility-org commented 5 years ago

Hi, please find below a review submitted by one of the reviewers:

Score: 7 Reviewer 3 comment : [Problem Statement]

The reproducers seem to have understood the paper, but oddly their introduction manages to obscure somewhat the problem that the authors solved. It is somewhat misleading; The original paper makes it clear rather quickly that it is concerned with the dynamic adjustment of learning rates during learning. I had to read the report's introduction side-by-side with the original paper, so it is less than ideal. The proposed contributions and new algorithms for choosing the LR are relatively poorly explained, but the experiments are well explained.

[Code]

The authors re-implemented the idea from scratch, so this is an attempt at true reproduction rather than repeatability. The code is readable, well commented, and can be followed easily. I have not attempted to check for its detailed correctness.

[Communication with original authors]

The reproducers have not communicated with the original authors, or at least have not reported doing so.

[Hyperparameter Search]

The reproducers have attempted to recreate the plots of the authors using the same hyperparameters, and encountered problems. This raises a few questions about the authors' work, but is not automatically a red flag. Unfortunately, the reproducers did not attempt to contact the authors in an attempt at resolving the discrepancy; So we do not know whether there is a software bug (perhaps with more careful review of the code this could be found), or the paper is genuinely irreproducible.

[Ablation studies]

Ablation studies are not truly relevant to the present paper and report. The paper studies a complete, self-contained set of LR adaptation policies, presenting theoretical backing for them and then using them.

[Discussion of results] [Recommendations for reproducibility]

The report describes a fair attempt at reproducing the work. For the cases the original authors drew most attention to (improved performance at risky high initial LRs), the reproducers have found the proposed LR adaptation policy not to perform as well as claimed.

The reproducers do not provide suggestions for how this paper could be salvaged, but it is not clear whether they in fact could; The authors specifically designed their algorithms to perform better at higher learning rates (presumably to reduce time to convergence) than SGD, and the reproducers find that they must use lower LRs than the authors have claimed. Absent a bug in their code, this suggests that the authors may have had more luck, or put more effort, into tuning their own algorithms than SGD.

[Overall organization and clarity]

The mathematical typography leaves something to be desired, and the English is somewhat unidiomatic, especially near the beginning of the paper, making it harder to read and less well-polished than it could be, but this does not detract from its findings. Confidence : 4

reproducibility-org commented 5 years ago

Hi, please find below a review submitted by one of the reviewers:

Score: 7 Reviewer 2 comment : The report presents a clear and concise problem formulation. The authors implement the code from scratch and is provided along with the submission. The quality of the code is really impressive as it is well documented. It is not exactly clear whether the authors tried to communicate with the original authors.

Regarding the result and discussion, the report excels in concise and to the point discussion. However, some of the claims seems to be not supported by experimental results. Section 4.1 for example, the report mentions that they tried fixing the initial learning rate for first few rounds - without any supporting results which decisively shows that it indeed didn't work. It is also unclear what the authors means by "bound values", probably they are referring to gradient clipping? If so, what was the clipping used and the result corresponding to each clipping value should be reported. In short, there was room for more hyperparameter search. In Section 4.2, the convergence times which are being compared against, it would be great if the learning rate would also be mentioned accordingly.

The report also does not provide any suggestions for the original authors to help with the reproducibility effort. The overall organization is good except a few grammatical errors. Confidence : 4

reproducibility-challenge / iclr_2019

final submission for issue #104 #158

104