VISUAL EXPLANATION BY INTERPRETATION: IMPROVING VISUAL FEEDBACK CAPABILITIES OF DEEP NEURAL NETWORKS

Hi, please find below a review submitted by one of the reviewers:

Score: 5 Reviewer 1 comment : [Problem statement] is clear:

bridging the gab between interpretation and explanation. The report only focuses on solving the lasso optimization problem to extract relevant features, and does not evaluate the dataset proposed in the original paper. This decision is clearly mentioned in the report.
In addition, the authors of the report replicated only a third of the original experiments (1 dataset out of 3) due to time and resource limitations.

[Code] the authors reproduced from scratch, and make their code available at https://github.com/Krestone/iclr_2019_code It would be nice to also add this link in the report to find the code base easily. Documentation of the code should be added.

[Communication with original authors] The report does not mention communications with the original authors for testing reproducibility.

[Hyperparameter Search] Due diligence is shown in hyperparameter sweep.

model parameter randomization test with same architecture |_ however, the results are not shown.
tried different model architectures: | MNIST CNN from MATLABs MathConvNet library | MNIST CNN with dropout from Keras library

[Ablation Study] The report provides an ablation study similar to the original paper. However a lack of technical details makes it hard to understand what could be reproduced from the original paper, and what could not.

[Discussion on results] The report does not contains detailed discussion on the state of reproducibility of the original paper:

The results of the MathConvNet experiments are not detailed enough. There are only and exactly 2 sentences for this section (Figure 3) which are almost impossible to understand without further explanations. For instance, what is Mu ? It's probably explained in the original paper, so having a reference to it would help a lot. Or just copying the formula where it was mentioned in the original paper. That way, we can have more context and know where to look at for more details if needed.
The results and discussion on the Keras experiments are not well explained either. For instance in the following sentence, the terms 'filters' and 'tipping point' are crucial to understand the meaning of the sentence, but no such definition was given in the report. "We experimented with different filter amounts [...], and all models showed different tipping points, but in all models below that tipping point, selected features would not cause much of a difference."

[Recommendations for reproducibility] The authors don't explicitely provide recommendations to the original authors for improving reproducibility, but they mention their difficulties using the same framework as for the original paper.

[Overall organization and clarity]

some sentences are a bit long and could be split in two for better clarity.
Overall, the report is well organized.
References to external papers should be in [brackets] not in (parenthesis).

Score: 5 (reject) More details and further experiments could make this report good enough for acceptance. Confidence : 3

Hi, please find below a review submitted by one of the reviewers:

Score: 4 Reviewer 1 comment : [Problem Statement] The report states and understands the problem statement of the original paper, but, in my opinion, does not convey this information very clearly. Without reading the original paper begin analyzed, one would have a limited understanding of exactly what is being achieved/how. The authors do a good job at reproducing the MNIST ablations, and I think this study contributes a lot of value. However, many of the empirical comparisons in the original paper are not including in the report (other datasets and visualizations), but the authors do explicitly state that they will only focus on the MNIST ablations.

[Code] The authors reproduced the code from scratch. Documentation is not polished, but is sufficient to figure out how to use/read their code.

[Communication with original authors] N.A.

[Hyperparameter Search] The authors test an additional network and more sparsity parameters for their MNIST ablation.

[Ablation Study] The root of the empirical comparison is an ablation study. As mentioned, the authors extend their ablations to remove a larger number of filters.

[Discussion on results] Limited to a few terse sentences, however there isn't much else to be added. I found this sufficient given the small set of experiments performed.

[Recommendations for reproducibly] There don't appear to be any recommendations provided in the report.

[Organization and clarity] The organization of the paper is good. However, the overall clarity and exposition of the report can be vastly improved. Figure 1 is relatively incomprehensible, and there is little information conveyed in the text. This is quite a shame since the authors provide valuable MNIST ablations, and, with a little more work, the quality of the report would have been drastically improved. I think improving the clarity of the writing, the transitions between sections, the code documentation, and the addition of a single ablation on ImageNet-cats or Fashion144K would qualify the paper for acceptance. Confidence : 4

Hi, please find below a review submitted by one of the reviewers:

Score: 5 Reviewer 3 comment : [Problem Statement]

The problem statement is clear and the reproducers appear to have understood the paper, although could have paraphrased better its contributions paragraph (ref: End page 2). I agree with the reproducers' decisions on where to focus for the reproduction effort.

[Code]

The authors re-implemented from scratch, so this is an attempt at true reproduction rather than repeatability. The codebase is polyglot, with some processing done in MATLAB and some in Python/Jupyter notebooks. Documentation and commenting is slightly lacking, it would really speed up relating the code with the paper and report.

[Communication with original authors]

The reproducers have not communicated with the original authors, and in particular not on OpenReview.

[Hyperparameter Search]

Reproduction was only attempted on smaller models; The completeness of the hyperparameter search must be understood within this frame. size That being said, for the MNIST model that the reproducers studied, and for a sweep of the authors' defined parameter "mu", the results are broadly in line with what the authors obtained. This suggests that the authors' strategy for feature selection prioritizes important information over less important information.

[Ablation studies]

Ablation studies are not quite relevant to the present paper and report. Nevertheless the reproducers did attempt to compare strategic filter dropping with random filter dropping, and it is confirmed that the authors' strategy identifies important features well.

[Discussion of results] [Recommendations for reproducibility]

In the light of how little of the paper has in fact been reproduced, the results cannot be said to be a full reproduction of the paper, but the present report does server as a good smoke test, and seems to indicate that the authors' work is worthy of further, fuller reproduction attempts.

[Overall organization and clarity]

The report feels skinny and lacking in meat; Were it to be expanded to more datasets, it would be the better for it. Confidence : 4

reproducibility-challenge / iclr_2019

VISUAL EXPLANATION BY INTERPRETATION: IMPROVING VISUAL FEEDBACK CAPABILITIES OF DEEP NEURAL NETWORKS #157