maomran commented 5 years ago

46

maomran commented 5 years ago

@reproducibility-org complete

reproducibility-org commented 5 years ago

Hi, please reformat the PR so that the folder is papers/<your team name>-<paper ID>. You can just modify your fork and the changes will be reflected here. Thank you!

maomran commented 5 years ago

@reproducibility-org not sure if this fix matches the requirement, please comment if something is missing.

reproducibility-org commented 5 years ago

@maomran <paper-ID> should be HyGh4sR9YQ for you.

maomran commented 5 years ago

@reproducibility-org thanks, I think it is changed now.

reproducibility-org commented 5 years ago

Hi, please find below a review submitted by one of the reviewers:

Score: 5 Reviewer 1 comment : Thank you for your submission.

I liked the following aspects:

Highlighted the reproducibility limitation for the paper
Useful insights into how to make the paper more useful

I understand that GA requires a different level of resources to experiment with but the reproducibility effort is limited as ablation/hyper-params results are limited and the results are over single run making it hard to comment on their variability.

I think the following information would make the report even more useful:

What hardware did you use
Mention the number of frames (or training updates) along with the time (in hours) as that depends on the hardware and hence not standardized.

Confidence : 3

reproducibility-org commented 5 years ago

Hi, please find below a review submitted by one of the reviewers:

Score: 5 Reviewer 3 comment : The manuscript provides a clear summary of the original paper, and the focus of the reproducibility study. In this case, the goal is to reproduce a portion of the results achieved with the evolutionary methods. This is worthwhile, since these methods are not often used for RL. However given that the original paper provided findings in favor of Genetic Algorithms (GA) over the gradient DQN method), it may be important to also check that the DQN results were sound.

The reproducibility study used code provided by the authors of the original paper, and, as far as I can tell, the same hyper-parameter settings as in the original paper. In this sense, this may have less value than if they had performed an independent search for hyper-parameters, especially for the DQN approach. The reproducibility study did consider role of the number of frames and number of GPU workstations in the results. The reproducibility study did not attempt to perform an extensive ablation study on the algorithms.

The reproducibility study could be improved by providing a more detailed description of the findings in its tables. For example, what are the units of the results in Table 1? What are the numbers in bold in Table 2? What are findings from Table 3? Providing a detailed discussion of specific findings, and relating them to the previous work is an important contribution of this type of paper.

An interesting contribution of the reproducibility work is to provide potential extensions of the GA, which is not directly a reproducibility finding, but suggests that doing the reproducibility study can be a good source of new research ideas.

The paper is reasonably well written, but some important information is missing, and writing could use some polishing for journal publication.

Overall, the reproducibility goals are clearly explained, but the level of completeness of the work and the explanations of the findings are not sufficiently strong to warrant publication of this report in a journal. The authors are strongly encouraged to share their work with the community.

Confidence : 4

reproducibility-org commented 5 years ago

Hi, please find below a review submitted by one of the reviewers:

Score: 5 Reviewer 1 comment : The report described an attempt to reproduce the results in the "Deep Neuroevolution" paper submitted to ICLR 2019.

The report is clearly written and easy to follow. The report contains a good summary of the paper it attempts to reproduce, the relevant work section included adequate references and previous work done in the area.
However, I do have some reservations regarding the experiments.
- The authors used published code and run these with the same hyper parameters as reported in the paper. Although this confirms that the results in the paper could be reproduced with the same hyperparam. It would have been more informative if the authors also run a separate hyperparam sweep (especially for the baselines reported in the paper).
- The authors have also run a single run of experiments. Although, I understand that it is resource consuming to run ES experiments. It would have still been more informative if authors conducted multiple runs.

Overall, the report is well-written, although the experiments could be improved. Confidence : 4

reproducibility-challenge / iclr_2019

Submission for issue #46 #153

46