[REVIEW]: shmlast: An improved implementation of Conditional Reciprocal Best Hits with LAST and Python

whedon commented 7 years ago

Submitting author: @camillescott (Camille Scott) Repository: https://github.com/camillescott/shmlast Version: v1.1 Editor: @biorelated Reviewer: @genomematt Archive: 10.5281/zenodo.260437

Status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/3cde54de7dfbcada7c0fc04f569b36c7"><img src="http://joss.theoj.org/papers/3cde54de7dfbcada7c0fc04f569b36c7/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/3cde54de7dfbcada7c0fc04f569b36c7/status.svg)](http://joss.theoj.org/papers/3cde54de7dfbcada7c0fc04f569b36c7)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer questions

Conflict of interest

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Version: Does the release version given match the GitHub release (v1.0.2)?
[x] Authorship: Has the submitting author (@camillescott) made major contributions to the software?

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: Have any performance claims of the software been confirmed?

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g. API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Authors: Does the paper.md file include a list of authors with their affiliations?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] References: Do all archival references that should have a DOI list one (e.g. papers, datasets, software)?

whedon commented 7 years ago

Hello human, I'm @whedon. I'm here to help you with some common editorial tasks for JOSS. @genomematt it looks like you're currently assigned as the reviewer for this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As as reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all JOSS reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

george-githinji commented 7 years ago

@genomematt Where are we at with this review? Please let us know.

genomematt commented 7 years ago

I am still working my way through this, but some initial comments @camillescott.

I think that you need a more from first principles statement of need that outlines the sequence matching problem that CRBH is addressing (remembering that this is a generalist scientific computing journal) so that readers don't need to chase a reference to get a basic idea of what the software does.

You need contribution, bug report and support guidelines - check out some other JOSS articles for what this should look like.

A real world (ish) worked example, ideally as a jupyter notebook or doctest file would greatly enhance, and would help with explaining the statement of need.

Very pleased to see coverage as part of your testing. Your exclusions all look very sensible on first pass. You may wish to add #pragma no cover directives marking the code you don't test. This makes it easier to track in your code and makes full coverage 100%. (suggestion not a requirement)

As you make a claim of improved performance (speed) it would be good to quantify that improvement.

I had an install issue but I am pretty sure it was related to a prior install not yours. I will redo on a clean VM to confirm.

camillescott commented 7 years ago

Thanks @genomematt. I'll add a section to the paper describing the approach in a bit more detail. I can easily add a figure with performance comps as well.

camillescott commented 7 years ago

Update: I have added a performance comparison figure, a section describing the method in a bit more detail, and contribution guidelines.

genomematt commented 7 years ago

Ok @camillescott the added bits are good. My install issue was a conda problem and all was fine on a clean vm.

Given that there are multiple dependencies I think there should be a test step at the end of your install procedure. This would not need to be comprehensive testing of code, just a 'if you have installed it correctly this input will give this output.'

I am giving you a scraping pass on the real world application documentation, but an additional notebook that does not hide the actual running in a script like the performance notebook, showing real transcripts against a real db and then interpreting the csv to answer the biological question would enhance. As it stands the documentation is good for anyone who has done this sort of procedure before, and not as easy for someone who has a set of transcripts and is asking 'what is the best way to match these to the most closely related model organism'. e.g. if I have 10 transcripts will this program work, will the results be valid? I really care about gene family X, how do I interpret the output to find my family X transcripts?

Neither of those would stop me recommending accepting, but the complete lack of community guidelines does. Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

camillescott commented 7 years ago

What is expected for guidelines? I did add a CONTRIBUTING.md: https://github.com/camillescott/shmlast/blob/master/CONTRIBUTING.md

I followed the example of several other accepted projects on that -- should I add more?

genomematt commented 7 years ago

Sorry @camillescott I missed the CONTRIBUTING.md. I suspect a lot of other people especially those with bugs or support questions would as well. At least link it from the readme, and personally I would be inclined to put such a short section just in the readme.

genomematt commented 7 years ago

Ok @biorelated I consider this ready to accept.

camillescott commented 7 years ago

Copy that @genomematt -- added it to the readme :)

george-githinji commented 7 years ago

Many thanks @genomematt.

george-githinji commented 7 years ago

@arfon can we prepare this work for publication please?

arfon commented 7 years ago

@camillescott - At this point could you make an archive of the reviewed software in Zenodo/figshare/other service and update this thread with the DOI of the archive? I can then move forward with accepting the submission.

camillescott commented 7 years ago

@arfon: I've archived on Zenodo; here is the DOI: http://doi.org/10.5281/zenodo.260437

note that the final published version after the requested changes is v1.1, not v1.0.2 as in Whedon's first post in this issue

arfon commented 7 years ago

@whedon set 10.5281/zenodo.260437 as archive

whedon commented 7 years ago

OK. 10.5281/zenodo.260437 is the archive.

arfon commented 7 years ago

Many thanks for your review @genomematt and thanks for editing this one @biorelated ✨

@camillescott - your paper is now accepted into JOSS and your DOI is http://dx.doi.org/10.21105/joss.00142 ⚡️ 🚀 💥

openjournals / joss-reviews