openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
722 stars 38 forks source link

[REVIEW]: Mashpit: sketching out genomic epidemiology #7306

Open editorialbot opened 1 month ago

editorialbot commented 1 month ago

Submitting author: !--author-handle-->@tongzhouxu<!--end-author-handle-- (Tongzhou Xu) Repository: https://github.com/tongzhouxu/mashpit Branch with paper.md (empty if default branch): Version: v0.9.7 Editor: !--editor-->@csoneson<!--end-editor-- Reviewers: @hkaspersen, @mberacochea Archive: Pending

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/760af75d515b1bc3d2fc87085fe79b92"><img src="https://joss.theoj.org/papers/760af75d515b1bc3d2fc87085fe79b92/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/760af75d515b1bc3d2fc87085fe79b92/status.svg)](https://joss.theoj.org/papers/760af75d515b1bc3d2fc87085fe79b92)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@hkaspersen & @mberacochea, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @csoneson know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @hkaspersen

📝 Checklist for @mberacochea

editorialbot commented 1 month ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 1 month ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.1093/bioinformatics/bty407 is OK
- 10.2807/1560-7917.es.2017.22.23.30544 is OK
- 10.1186/s13059-016-0997-x is OK
- 10.21105/joss.00027 is OK
- 10.1186/1471-2105-10-421 is OK
- 10.1128/aem.01746-19 is OK
- 10.1101/gr.251678.119 is OK
- 10.3389/fmicb.2017.00375 is OK

🟡 SKIP DOIs

- None

❌ MISSING DOIs

- None

❌ INVALID DOIs

- None
editorialbot commented 1 month ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.02 s (1110.0 files/s, 102938.1 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                           8            138            158           1050
Markdown                         3             90              0            298
HTML                             4             11              2            155
YAML                             4             11             11             97
TeX                              1              9              0             88
JavaScript                       2              0              7              2
CSS                              1              0              5              1
-------------------------------------------------------------------------------
SUM:                            23            259            183           1691
-------------------------------------------------------------------------------

Commit count by author:

   148  Tongzhou Xu
    14  tongzhouxu
     7  dependabot[bot]
     3  Lee Katz
     1  Lee Katz - Aspen
     1  Lee Katz gzu2
editorialbot commented 1 month ago

Paper file info:

📄 Wordcount for paper.md is 1226

✅ The paper includes a Statement of need section

editorialbot commented 1 month ago

License info:

🟡 License found: GNU General Public License v2.0 (Check here for OSI approval)

csoneson commented 1 month ago

👋🏼 @tongzhouxu, @hkaspersen, @mberacochea - this is the review thread for the submission. All of our communications will happen here from now on.

As a reviewer, the first step is to create a checklist for your review by entering

@editorialbot generate my checklist

as the top of a new comment in this thread. These checklists contain the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. The first comment in this thread also contains links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues directly in the software repository. If you do so, please mention this thread so that a link is created (and I can keep an eye on what is happening). Please also feel free to comment and ask questions in this thread. It is often easier to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if any of you require some more time. We can also use EditorialBot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@csoneson) if you have any questions or concerns. Thanks!

editorialbot commented 1 month ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

hkaspersen commented 1 month ago

Review checklist for @hkaspersen

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

mberacochea commented 1 month ago

Review checklist for @mberacochea

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

hkaspersen commented 1 month ago

I have now completed my review of this software and manuscript. The authors present an interesting and useful tool that will solve a lot of issues regarding sensitive data and analysis speed. Below are some specific comments I would like the authors to address.

Functionality: The functionality of the software (v. 0.9.7) was tested on our HPC cluster, using Miniconda for installation by following the guidelines from the GitHub page. No problems were encountered during installation. The software was tested by sketching the Salmonella database as described in the example commands, followed by querying a local Salmonella genome against the database, using default settings.

Documentation: The documentation provided on GitHub is a bit lacking and could be expanded.

Manuscript: The manuscript is well-written and provides context on why the software is useful, and what problem it solves.

tongzhouxu commented 1 month ago

Hi @hkaspersen , Thank you so much for your review and your suggestions. We greatly appreciate the time and effort you invested in evaluating our work. We will revise the codes and manuscript accordingly.

csoneson commented 2 weeks ago

👋🏻 Just wanted to check in on the progress of the reviews here. @hkaspersen - thanks for your initial comments! @mberacochea - could you let us know how things are going on your side, or if you have any questions. Thanks everyone!

mberacochea commented 2 weeks ago

Hi @csoneson! I've been traveling for the past few weeks, and I will sort out my review in the next few days.

mberacochea commented 1 week ago

I've finished my review :).

The authors present a command-line utility that addresses a problem in a convenient and performant way. The code is well-structured, and the installation and usage instructions work as expected. The repository includes a set of unit tests covering a significant portion of the API, automated through GitHub Actions. This approach leverages the efficient use of Sourmash to compare large numbers of genomes very quickly, providing a highly convenient tool that reduces the friction of this approach—from downloading references to querying using the user’s sample FASTA file.

Source code - docs and functionality

I've submitted a series of issues with my suggestions to improve the source code, those are:

These issues cover aspects of the quality of the source, functionality and documentation. I'll update my check list after the author review my tickets in the repo.

Paper

The paper is well written and it allows the reader to understand the purpose and scope of the application.

Some notes on particular lines:

A more general note about Mashpit: If I understood correctly, the accuracy of placing a genome in the correct SNP cluster isn’t very high (70% for Salmonella when considering that the "right" SNP cluster should be within the top 25). This seems quite relevant, as it’s likely important to users (this is my assumption). I would suggest expanding or rephrasing this part to make it clearer to users what the best use case for Mashpit is (it’s already mentioned in the discussion, but I feel it’s a bit lost there). I would also suggest splitting Figure 1 into two figures, one to explain the "compute performance" (in terms of resources) and the other to show the tool's accuracy.

tongzhouxu commented 1 week ago

Hi @mberacochea , thank you for taking the time to review Mashpit. We will modify the code accordingly based on your feedback.

Best, Tongzhou