oicr-gsi / robust-paper

Ten Simple Rules for Making Research Software More Robust. Manuscript is published at PLoS Computational Biology. Feedback is welcome!
https://doi.org/10.1371/journal.pcbi.1005412
MIT License
12 stars 0 forks source link

PLoS Computational Biology Reviews #23

Closed morgantaschuk closed 7 years ago

morgantaschuk commented 7 years ago

Reviews are back! Now the fun begins. One of the reviewers kindly signed their name, but I left it out of this issue just in case.


Dear Dr Taschuk,

Thank you very much for submitting your manuscript, 'Ten Simple Rules for Making Research Software More Robust', to PLOS Computational Biology. As with all papers submitted to the journal, yours was fully evaluated by the PLOS Computational Biology editorial team, and in this case, by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified some aspects of the manuscript that should be improved.

We would therefore like to ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer and we encourage you to respond to particular issues raised.

In particular, please see Reviewer 1's comment on Titus Brown's blog postings and Reviewer 3's comment reminding you of the "Ten Simple Rules for Writing a PLOS Ten Simple Rules Article".

In addition, when you are ready to resubmit, please be prepared to provide the following:

  1. A detailed list of your responses to the review comments and the changes you have made in the manuscript. We require a file of this nature before your manuscript is passed back to the editors.
  2. A copy of your manuscript with the changes highlighted (encouraged). We encourage authors, if possible to show clearly where changes have been made to their manuscript e.g. by highlighting text.
  3. PLOS offers a figure-checking tool, PACE (http://pace.apexcovantage.com/), to help authors to ensure all figures meet PLOS requirements so that the quality of published figures is as high as possible. Please use this tool to help you format your figures. PACE is a digital diagnostic and conversion tool for figure files. It will provide information about any failed check(s) and, if able, will automatically convert the figure file into an acceptable file that passes quality checks. PACE requires you to register for an account to ensure your figure files are processed securely.
  4. A striking still image to accompany your article (optional). If the image is judged to be suitable by the editors, it may be featured on our website and might be chosen as the issue image for that month. These square, high-quality images should be accompanied by a short caption. Please note as well that there should be no copyright restrictions on the use of the image, so that it can be published under the Open-Access license and be subject only to appropriate attribution.

Before you resubmit your manuscript, please consult our Submission Checklist to ensure your manuscript is formatted correctly for PLOS Computational Biology: http://www.ploscompbiol.org/static/checklist.action. Some key points to remember are:

We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we ask that you let us know the expected resubmission date by email at ploscompbiol@plos.org.

If you have any questions or concerns while you make these revisions, please let us know.

Sincerely,

Scott Markel, Ph.D. Associate Editor PLOS Computational Biology

A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately:

Reviewer's Responses to Questions

Comments to the Authors: Please note here if the review is uploaded as an attachment.

Reviewer 1

This manuscript provides rules aimed at novice programmers to help them write more robust bioinformatics programs. The rules are all sensible, and biologists turned programmers (and some people with formal software engineering training) would do well to read them.

With any set of guidelines, and especially guidelines about programming, there is room for subjectivity and differences of opinion, so here are mine. The authors are free to include or exclude these comments in their revision.

The rules are framed in terms of increasing robustness.

The piece starts out defining robustness in terms of portability, specifically outside the programmer's own environment - but then goes on to define it more broadly in terms of multiple criteria such as being kept under version control (which I don't think is a necessary condition for robustness - many robust programs were made before VCSs

The definition here is somewhat different from Wikipedia's, which defines it as:

"the ability of a computer system to cope with errors during execution[1][2] and cope with erroneous input."

I am not sure of the best name for the property the authors are aiming at, but it seems some combination of robustness, maintainability, quality, predictability - essentially better software.

I find the relegation of documentation to an additional note rather than a rule surprising. It's also telling the authors don't call out maintainability as a desirable property (except in passing in the conclusions). If software is not thrown away, it frequently needs to be updated - e.g. to accomodate changes in input file types. This maintenance task often falls on people who are not the original author. If software does not have adequate documentation, it will be harder to maintain (either by the original author or by others).

The rules do not mention anything about good software engineering practice, such as structured programming. While it's good to not be too specific and dictate specific fads or paradigms (and the intended audience is individual programmers not teams), no one would disagree that some kind of good structured programming practices are required to write programs where clarity, maintainability, extensibility and quality are of concern.

In general, the program should not repeat large pieces of code with the same logic - these should be abstracted into subroutines/functions/methods/classes etc. The program should not be a giant monolithic main() method. Behavior of subroutines should not be controlled through globals. etc.

While it's not necessary to replicate Knuth's three volumes, I think good software engineering practice deserves to be a rule, or at least called out.

Reproducibility of software and workflows is increasingly in the spotlight. There are passing mentions of reproducibility, and it is implicit in many of the guidelines, but it would be good to call it out explicitly as a goal beyond making the software robust. For example, in the list on p2.

People like Titus Brown have written a lot about this in blogs. It would be useful to reference some of this.

  1. README

Excellent advice, although there is some overlap with 2.

  1. Usage info

Good advice, but I wonder if 2 and 8 could be combined?

  1. Versioning

All good advice.

As the authors note, GitHub makes this very easy. The advice is a bit succinct:

"as simple as adding an appropriate commit message or tag to version control"

As this article seems to be aimed as novice programmers who may not have been immersed in good, modern, open software development practice, it would be a good idea to include more details or pointers here.

In fact, I would recommend every programmer learns how to use a VCS and a hosted VCS like GitHub/GitLab/Bitbucket, to commit early and commit often, to use issue trackers, etc. Not just a matter of making public releases (which is important) but also as a matter of good practice even for software not intended to be released.

  1. Reuse software

At first I assumed this section was about reusing software libraries, but it transpired to be about reusing executable programs.

This seems an odd guideline. While it's certainly convenient to launch a program using commands like 'system' in perl, it's mostly better to use a library (or service). In fact the authors agree with this in 5, in which case why are we recommending them do this here?

The advice seems a bit confused in general. It's not really clear if the authors are saying if it's a good idea or bad idea to call external programs.

I think general advice to 'take care when calling external programs' may be more appropriate.

  1. built utility / package manager

Excellent advice that is all too often ignored.

As noted above the presentation in combo with 4 is confusing. Generally a package manager is used when an external library, module or package is re-used. But as noted there is generally no advice on reusing these artefacts, as 4 is about running programs.

  1. Do not require root privileges

I certainly wouldn't contradict this but I wonder why it's called out. There are many possible inadvisable things we've all seen in years of programming, but maybe this could be combined with advice about running external programs (4) or making the software runnable outside a default user environment (7)

  1. Eliminate hard-coded paths

Excellent advice, and worth calling out as it is such a frequent problem

  1. Configuration from command line

Good advice, but could be combined with 7, as they are both about configuration.

  1. Include a small test set

Good advice, but there is no advice about including software unit tests. This is good general software engineering advice, and the existence of simple cloud-based CI frameworks like Travis that integrate with GitHub make it very easy to ensure that every change to a program does not break thing.

  1. Produce identical results when given identical inputs

Generally good advice, but perhaps overly specific. There is a lengthy discussion about what to do in the case of algorithms that involve random numbers. What about software that calls external services?

In my opinion this is a judgment call by the software author. Certainly, if there is no deliberate randomization or calling of external services then nondetermistic output would be troubling.

As mentioned above, documentation is important, and some of the existing 10 could be merged to make room for this.

For services, I am not sure why authentication is mentioned, even in passing. It doesn't really pertain to robustness per se.

Reviewer 3

Comments to the author:

This paper sets out to describe 10 simple rules which, if applied, can contribute to making software research more robust. It provides a useful set of guidelines that should generally improve the robustness of research software and it should be of interest to most graduate students who are writing software in the process of their research.

I recommend the publication of this paper but would request the addition or additional emphasis on one point which is not called out as a rule but which is mentioned in passing in several places within the paper. In particular I am referring to the recommendation to put all software into a source control system. As this is an opinion paper, I do not think this is a necessary change for publication (in my opinion it’s a vital rule, that is just my opinion) but it is referenced in several places and in particular the author’s definition of “robust software” includes “...is kept under version control”. As this is an issue frequently ignored or addressed “too late” by graduate students I would like to see this get additional emphasis, even if it is not promoted to being a full rule.

The remainder of my comments are meant to provide (hopefully) useful feedback to the authors in the line of suggestions that may enhance the manuscript.

In describing successful software research projects the authors state “...who have adopted strong software engineering approaches, have high standards of reproducibility, use good testing practices, and foster inter-institutional collaborations that grow into strong user bases”.. All of these points can be clearly tied to their rules with the exception of the final statement about inter-institutional collaborations. While such collaborations are desirable, particularly to establish project continuity and longevity, they are not strictly necessary for robustness and this is not further referenced in the manuscript.

In several cases, it seems that the paper might violate the first rule of “The 10 simple rules for writing a 10 simple rules paper” (http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003858) with multiple rules combined into a single rule. For example rule 8 combines putting parameters on the command line with echoing all parameters to the log file to support reproducibility and provenance. Similarly rule 5 covers both automating the build process and automating the deployment and installation. Rule 4 also tends to stuff multiple rules (bolded) into one rule (( limit dependencies, ensure dependencies are available) these 2 are closely related), with “ use native functions” which is a very different idea and not what I would expect from a ‘reuse’ rule. Finally the ‘What we left out’ section implicitly defines an additional rule (documentation). In this particular case there could have been a single rule (Document your code and usage) that could encompass the points of both Rules 1 and 2 (Readme, usage) as well as the points from ‘What we left out’.

In rule 8, I would tend to suggest generalizing the rule somewhat to suggest externalizing parameters rather than necessarily requiring them to be available as command line options. For example if a configuration file is used to provide all parameter values, this could accomplish the task of separating the parameters out from the code as well as providing a record of the values used for a run as they describe through echoing parameters to a log file.

gvwilson commented 7 years ago

See #24, #25, #26, #27, #28, #29, #30, #31, #32, #33, #34, #35, #36, #37, #38, #39