[REVIEW]: Multilocus sequence typing by blast from de novo assemblies against PubMLST

whedon commented 8 years ago

Submitting author: @andrewjpage (Andrew Page) Repository: https://github.com/sanger-pathogens/mlst_check Version: v2.1.1630910 Editor: @pjotrp Reviewer: @harmn Archive: 10.6084/m9.figshare.4285097.v1

Status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/0b801d23613c9b626c2b6028f8c14056"><img src="http://joss.theoj.org/papers/0b801d23613c9b626c2b6028f8c14056/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/0b801d23613c9b626c2b6028f8c14056/status.svg)](http://joss.theoj.org/papers/0b801d23613c9b626c2b6028f8c14056)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer questions

Conflict of interest

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Version: Does the release version given match the GitHub release (v2.1.1630910)?
[x] Authorship: Has the submitting author (@andrewjpage) made major contributions to the software?

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: Have any performance claims of the software been confirmed?

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g. API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

Paper PDF: 10.21105.joss.00118.pdf

[x] Authors: Does the paper.md file include a list of authors with their affiliations?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] References: Do all archival references that should have a DOI list one (e.g. papers, datasets, software)?

whedon commented 8 years ago

Hello human, I'm @whedon. I'm here to help you with some common editorial tasks for JOSS.

For a list of things I can do to help you, just type:

@whedon commands

arfon commented 8 years ago

👋 @harmn - this is where the review for this submission take place. @pjotrp can help you with any questions you might have.

happykhan commented 8 years ago

I can confirm that the program works as described, on real world data and meets all of the requirements stated here.

andrewjpage commented 7 years ago

Thanks @happykhan If you need some alternative suggestions for reviewers, heres some researchers who have written MLST code & are active on GitHub (and I have never published with):

@lskatz (CDC - Centers for Disease Control) https://github.com/lskatz/lyve-MLST @aunderwo (PHE - Public Health England) https://github.com/phe-bioinformatics/MOST @jacarrico (Universidade de Lisboa, Portugal) https://github.com/jacarrico/MLSTtrimm & ReMatCh

pjotrp commented 7 years ago

Relax @andrewjpage. @harmn has agreed to review.

harmn commented 7 years ago

indeed, looking for some time to get on it.

Harm

On 23 Nov 2016, at 10:35, Pjotr Prins notifications@github.com<mailto:notifications@github.com> wrote:

Relax @andrewjpagehttps://github.com/andrewjpage. @harmnhttps://github.com/harmn has agreed to review.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/openjournals/joss-reviews/issues/118#issuecomment-262468060, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ABZuOe8q1GPU3GY44i4oq1y0rbqx0WCeks5rBAjTgaJpZM4Kxz06.

harmn commented 7 years ago

I reviewed mlst_check: it installs and runs. Some issues/comments below, in general the documentation is a bit limited.

Installation instructions Installation works fine on Ubuntu 16.04.1 LTS, but it does require root access/sudo rights or a docker installation. Might be good to mention that.

Example usage I could not find the sample files described in the README.md file for the docker container (/data/sample1.fa etc.). I used the Salmonella files in "mlst_check/example/input_data" to test the software.

Documentation The used methods are poorly described (as far as I can tell), for instance I had to look in the docstrings to find out what is used to find the best hit: "The best hit has the greatest number of matching bases. If two hits have the same number of matching bases, the one with the greater percentage identity is selected."

I had to search the code for the meaning of the tilde in the ST column, "Add tilde to matches which are not 100%". Would be good to add that to the README.md.

What is exactly expected in the input FASTA files (I assume complete genomes)? What is the output if your sequences have no hits with the selected scheme?

Automated tests There are quite a number test scripts, but I could not find any description of the tests. I ran all and found no problems.

References Page et al. does not have a doi (should be http://dx.doi.org/10.1099/mgen.0.000083)

pjotrp commented 7 years ago

@andrewjpage Do you mind addressing mentioned issues?

andrewjpage commented 7 years ago

@harmn Thanks for the comprehensive review. @pjotrp Yes I will address all of the issues, but it will take me a few days.

andrewjpage commented 7 years ago

Hi @harmn Thanks for reviewing this submission.

Installation instructions Installation works fine on Ubuntu 16.04.1 LTS, but it does require root access/sudo rights or a docker installation. Might be good to mention that.

I have clarified that root access is needed, and also added instructions for installing with HomeBrew/LinuxBrew (which doesn't need root).

Example usage I could not find the sample files described in the README.md file for the docker container (/data/sample1.fa etc.). I used the Salmonella files in "mlst_check/example/input_data" to test the software.

I have added test files to the docker container so that the example command works out of the box.

Documentation The used methods are poorly described (as far as I can tell), for instance I had to look in the docstrings to find out what is used to find the best hit: "The best hit has the greatest number of matching bases. If two hits have the same number of matching bases, the one with the greater percentage identity is selected."

I had to search the code for the meaning of the tilde in the ST column, "Add tilde to matches which are not 100%". Would be good to add that to the README.md.

What is exactly expected in the input FASTA files (I assume complete genomes)? What is the output if your sequences have no hits with the selected scheme?

I have extended README documentation to clarify the methods used, the meaning of the various notations and extended the examples.

Automated tests There are quite a number test scripts, but I could not find any description of the tests. I ran all and found no problems.

By default the tests produce minimal output when running, but can be run in a more verbose mode 'dzil test --test-verbose' which gives a text string for each test. I have improved the messages coming out, and added more descriptive text to blocks of tests to make it clearer whats going on. Also I have set TravisCI to run the tests in verbose mode so that its easier to see.

References Page et al. does not have a doi (should be http://dx.doi.org/10.1099/mgen.0.000083)

Thanks this has been corrected.

harmn commented 7 years ago

@andrewjpage nice work @pjotrp happy to accept this

pjotrp commented 7 years ago

@arfon ready to accept.

arfon commented 7 years ago

@andrewjpage - At this point could you make an archive of the reviewed software in Zenodo/figshare/other service and update this thread with the DOI of the archive? I can then move forward with accepting the submission.

andrewjpage commented 7 years ago

I've added it to Figshare as: https://dx.doi.org/10.6084/m9.figshare.4285097.v1

arfon commented 7 years ago

@whedon set 10.6084/m9.figshare.4285097.v1 as archive

whedon commented 7 years ago

OK. 10.6084/m9.figshare.4285097.v1 is the archive.

arfon commented 7 years ago

Many thanks for the review @harmn and for editing this one @pjotrp.

@andrewjpage - your paper is now accepted into JOSS and your DOI is http://dx.doi.org/10.21105/joss.00118 ⚡️ 🚀 💥

genomematt commented 7 years ago

There are unformatted citations in the paper @arfon @pjotrp [Seeman2016; Jolley2010]

andrewjpage commented 7 years ago

Sorry Matt, I've fixed it now in the original markdown file.

arfon commented 7 years ago

Thanks for the heads up @genomematt - this should be fixed now.

openjournals / joss-reviews