openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
725 stars 38 forks source link

[REVIEW]: Opfi: A Python package for identifying gene clusters in large genomics and metagenomics data sets #3678

Closed whedon closed 3 years ago

whedon commented 3 years ago

Submitting author: @alexismhill3 (Alexis Hill) Repository: https://github.com/wilkelab/Opfi Version: 0.1.2 Editor: @csoneson Reviewer: @Thomieh73, @afrubin Archive: 10.5281/zenodo.5601741

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/cc0a3b03e9cff97e5f93e1a3a8a1808d"><img src="https://joss.theoj.org/papers/cc0a3b03e9cff97e5f93e1a3a8a1808d/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/cc0a3b03e9cff97e5f93e1a3a8a1808d/status.svg)](https://joss.theoj.org/papers/cc0a3b03e9cff97e5f93e1a3a8a1808d)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@Thomieh73 & @afrubin, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @csoneson know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Review checklist for @Thomieh73

✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @afrubin

✨ Important: Please do not use the Convert to issue functionality when working through this checklist, instead, please open any new issues associated with your review in the software repository associated with the submission. ✨

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 3 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @Thomieh73, @afrubin it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 3 years ago

Wordcount for paper.md is 1401

whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1186/2042-5783-2-3 is OK
- 10.1038/nrg1709 is OK
- 10.1186/1471-2105-10-421 is OK
- 10.1038/nbt.3988 is OK
- 10.1038/s41592-021-01101-x is OK
- 10.1093/nar/gkz310 is OK
- 10.1093/nar/gky383 is OK
- 10.1093/nar/gkz192 is OK
- 10.1186/1471-2105-8-18 is OK
- 10.1093/bioinformatics/btaa213 is OK
- 10.1101/2021.08.16.456562 is OK

MISSING DOIs

- None

INVALID DOIs

- None
whedon commented 3 years ago
Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.31 s (525.7 files/s, 84309.5 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
C++                              85           2266            585          12905
Python                           38           1095           1366           4350
C/C++ Header                     19            261             61           1418
Markdown                          5            109              0            305
XML                               1              0              0            210
TeX                               1             10              0            179
reStructuredText                  8            202            244            177
YAML                              3             11             12            103
Jupyter Notebook                  1              0            302             77
make                              2             12              8             28
Bourne Again Shell                1              1              0              4
--------------------------------------------------------------------------------
SUM:                            164           3967           2578          19756
--------------------------------------------------------------------------------

Statistical information for the repository 'fbe5dc8158f92bfb3e7b73b1' was
gathered on 2021/08/31.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Alexis Hill                    146         23336           3224           72.53
Jim Rybarski                   159          7274           2786           27.47

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Alexis Hill               19831           85.0         17.4                5.03
Jim Rybarski               4476           61.5         12.2                8.80
whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

csoneson commented 3 years ago

👋🏼 @alexismhill3, @clauswilke, @Thomieh73, @afrubin - this is the review thread for the submission. All of our communications will happen here from now on.

Both reviewers have checklists at the top of this thread. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues directly in the software repository. If you do so, please mention this thread so that a link is created (and I can keep an eye on what is happening). Please also feel free to comment and ask questions in this thread. It is often easier to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

Please feel free to ping me (@csoneson) if you have any questions or concerns. Thanks!

afrubin commented 3 years ago

@csoneson not sure why, but I'm unable to check any of the boxes in my checklist.

afrubin commented 3 years ago

@alexismhill3 there are several (mostly old) open issues in the GitHub repository at this time. What is the status of these issues? Some of them seem worth fixing, especially the temp file (https://github.com/wilkelab/Opfi/issues/181) and missing documentation (https://github.com/wilkelab/Opfi/issues/30) issues.

It would be helpful to know before we start adding our own comments during the review.

danielskatz commented 3 years ago

@afrubin - did you accept the invitation mentioned at the start of this review issue?

afrubin commented 3 years ago

@danielskatz totally missed it, sorry about that! Looks like everything is working now.

clauswilke commented 3 years ago

@afrubin The vast majority of the old issues were stale and we have started closing them. The temp-file issue unfortunately is caused by a dependency and doesn't have an easy fix. Documentation has improved massively since spring 2020.

Any requests for improvement, new features, or bug fixes should be filed as new issues. Thanks!

Thomieh73 commented 3 years ago

@csoneson Hi, I am having issues with notifications from this repo. I am receiving all notifications from this entire repo. I have pushed the button to unsubscribe from this thread, but somehow I am stil receiving all notifications from this repo. Any suggestions on how to solve this. It drives me nuts to have 100 mails in my mailbox everyday that are not relevant.

csoneson commented 3 years ago

@Thomieh73 - check the instructions in the second post above: https://github.com/openjournals/joss-reviews/issues/3678#issuecomment-909628355

Thomieh73 commented 3 years ago

@csoneson Thanks I had completely forgotten about that option at github. :-)

whedon commented 3 years ago

:wave: @Thomieh73, please update us on how your review is going (this is an automated reminder).

whedon commented 3 years ago

:wave: @afrubin, please update us on how your review is going (this is an automated reminder).

afrubin commented 3 years ago

I started with the installation instructions and have opened some related issues: https://github.com/wilkelab/Opfi/issues/198 https://github.com/wilkelab/Opfi/issues/199 https://github.com/wilkelab/Opfi/issues/200 https://github.com/wilkelab/Opfi/issues/201

alexismhill3 commented 3 years ago

Thanks @afrubin, I'll get started working on these

Thomieh73 commented 3 years ago

Hi @alexismhill3, I had created two issues when working on the installation and on your tutorial (which you already solved). https://github.com/wilkelab/Opfi/issues/202 - which is a suggestion of how one can create a conda environment with all the dependencies.

I did not find any other problems using the tool or checking the documentation

Thomieh73 commented 3 years ago

@csoneson I am done with the checklist. It is an interesting tool due to the versatility of what you can research.

alexismhill3 commented 3 years ago

Awesome, thanks @Thomieh73! I'm working on creating a conda package recipe for Opfi, per your suggestion, which I agree is a good idea. I think this will also resolve some other issues raised by @afrubin about the clarity of the installation process, since it will basically be condensed down to conda install opfi

afrubin commented 3 years ago

@alexismhill3 I'm waiting to try out the new installation method, but in the meantime here are a few comments on the manuscript.

I was a little confused by the exclusive focus on metagenomics in the statement of need section. This led me to expect that the program would be analyzing some sort of shotgun metagenomic data, rather than a single genome assembly. Clearly many new genomes are arising from metagenomic studies, but this could be clearer from the start.

The implementation says that Opfi is implemented entirely in Python, but the repository does contain an optional C dependency. This should be mentioned here.

The use of capitalization in the manuscript is inconsistent (e.g. Cas1 vs. cas7, "Blastp" which is usually stylized as blastp or BLASTP). There are other formatting inconsistencies, for example Rippkea orientalis is not italicized. The authors should perform a detailed pass through the manuscript to address these kind of issues.

afrubin commented 3 years ago

I have added a couple more GitHub issues arising from the manuscript: https://github.com/wilkelab/Opfi/issues/205 https://github.com/wilkelab/Opfi/issues/206

alexismhill3 commented 3 years ago

@afrubin I've created a conda recipe for Opfi and updated the documentation accordingly. These changes were merged, so you should be able to see them on the main branch now.

I had to update a few unit tests during this process. Specifically, some tests required the modified version of PILER-CR, which we have decided to deprecate the use of (see wilkelab/Opfi#207 for more details). So if you were intending to run the unit test suite, you should grab the new changes before doing so.

csoneson commented 3 years ago

👋🏻 Hi all, just checking in on the reviews here. @afrubin - did you have a chance to test the new installation process?

afrubin commented 3 years ago

Hi @csoneson, I was just waiting on the other issues that I opened to be addressed before coming back to this. I'll have a look at the new installation process though.

csoneson commented 3 years ago

Perfect, thanks @afrubin! @alexismhill3 - could you perhaps comment on the status of the mentioned issues?

clauswilke commented 3 years ago

@csoneson Thanks for the reminder. We are basically done addressing the points that were raised. Will provide details within the next day or so.

clauswilke commented 3 years ago

In brief, we have made all changes except the API changes requested in issues https://github.com/wilkelab/Opfi/issues/205 and https://github.com/wilkelab/Opfi/issues/206. We agree with these requests in principle, but they are breaking changes and that would create major problems for us in the code for a related paper (https://www.biorxiv.org/content/10.1101/2021.08.16.456562v2). This paper is about to formally appear, and we don't want to end up with a situation where this client code is immediately incompatible with the latest official release of Opfi.

We will leave these issues open and may address them at a future point.

Regarding the term "Operon" (https://github.com/wilkelab/Opfi/issues/205), we have made changes to the documentation to make it absolutely clear that we are not claiming that an Operon object in the library is definitely an operon in the biological meaning.

alexismhill3 commented 3 years ago

Thanks @afrubin and @Thomieh73 for your feedback. Here is a summary of all of the changes we have made to address the points raised so far:

Recommendation: Make Opfi available on Bioconda

Thanks for the suggestion @Thomieh73. Since Opfi has several dependencies, all of which are currently available on Bioconda, it makes sense to put Opfi on Bioconda as well. This simplifies the installation procedure, since conda manages dependencies automatically. We have added a recipe for Opfi to Bioconda and updated the documentation accordingly.

Installation instructions/issues

We have fixed inconsistencies in the installation instructions (specifically between RTD and the README on Github). We have also removed the distinction between "optional" and "required" software dependencies in the documentation - all dependencies are now considered hard requirements, and need to be installed for the full test suite to pass. We have also updated the documentation with the new conda installation method.

Manuscript comments

The following comments were made regarding the text in the manuscript; see our responses below.

I was a little confused by the exclusive focus on metagenomics in the statement of need section. This led me to expect that the program would be analyzing some sort of shotgun metagenomic data, rather than a single genome assembly. Clearly many new genomes are arising from metagenomic studies, but this could be clearer from the start.

We agree this could be misleading, and have toned down the emphasis on metagenomics in the statement of need. To make it clear that Opfi analyzes assembled data, the beginning of the summary now reads:

Gene clusters perform a diverse set of functions, many of which are relevant to biotechnology. There is a need for software tools that can extract candidate gene clusters from the vast amount of available genomic data. Therefore, we developed Opfi: a modular pipeline for identification of arbitrary gene clusters in assembled genomic or metagenomic sequences.

The implementation says that Opfi is implemented entirely in Python, but the repository does contain an optional C dependency. This should be mentioned here.

We have revised the implementation statement so that it now reads:

Opfi is implemented in Python, and uses several bioinformatics tools for feature annotation (Buchfink et al., 2021; Camacho et al., 2009; Edgar, 2007; Shi & Liang, 2019; Steinegger & Söding, 2017). Users can install Opfi and all of its dependencies from Bioconda, using the conda package manager.

We have also removed the one-off fork of PILER-CR from the repository. The documentation now instructs users to install the official version of PILER-CR, and it gets installed with conda by default.

The use of capitalization in the manuscript is inconsistent (e.g. Cas1 vs. cas7, "Blastp" which is usually stylized as blastp or BLASTP). There are other formatting inconsistencies, for example Rippkea orientalis is not italicized. The authors should perform a detailed pass through the manuscript to address these kind of issues.

We have carefully reviewed the manuscript for formatting inconsistencies and capitalization issues. Regarding gene/protein names, we are now using the following convention: gene names are in lowercase and italics (e.g. cas1), and protein names (including references to protein sequences) have the initial letter capitalized (e.g. Cas1).

Requested API changes

Two Github issues request changes to the API that would break backwards compatibility with existing code. For that reason, we are not going to make these code changes at this time, but these issues will remain open and may be addressed in the future. We have added the following statement to the documentation and paper to make it very clear that Opfi does not attempt to predict whether an Operon object definitely represents a biological operon:

It should be noted that the use of the word "operon" throughout this library is an artifact from early development of Opfi. At this time, Opfi does not predict whether a candidate system represents a true operon, that is, a set of genes under the control of a single promoter. Although a candidate gene cluster may certainly qualify as an operon, it is currently up to the user to make that distinction.

afrubin commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

afrubin commented 3 years ago

@alexismhill3 @clauswilke I understand the tension between wanting to update but avoiding breaking changes (especially with another manuscript involved), and I'm satisfied with the way you addressed the outstanding issues. I hope this software is widely-adopted and that you have the time and motivation to come back to them!

My comments on the manuscript have been addressed in full. The only outstanding issue is that you should cite Bioconda in the manuscript: https://bioconda.github.io/#citing-bioconda

FYI I was able to install Opfi and run the tests using Conda via Docker: https://docs.anaconda.com/anaconda/user-guide/tasks/docker/

I don't think you need to add this to the documentation (unless you want to support this use case) but I wanted to mention it in case it's useful for you or another user later on.

clauswilke commented 3 years ago

@afrubin Thanks! We'll add the reference to Bioconda.

csoneson commented 3 years ago

Thanks everyone for the quick responses. I see that both reviewers have checked all the boxes in the checklists above, and I just wanted to make sure I'm not missing any issue that is still being worked on (apart from adding the Bioconda reference). @afrubin, @Thomieh73 - could you confirm that you are happy with the current state of the submission? In the meantime, I will also have a quick look through and get back to @alexismhill3 and @clauswilke shortly with the next steps.

afrubin commented 3 years ago

@csoneson Yes, I am happy with the submission (once the Bioconda reference is added).

alexismhill3 commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

alexismhill3 commented 3 years ago

Thanks everyone! @csoneson @afrubin I've added the Bioconda reference.

csoneson commented 3 years ago

@whedon check references

whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1038/s41592-018-0046-7 is OK
- 10.1002/biot.201000181 is OK
- 10.1186/1471-2105-10-421 is OK
- 10.1038/nbt.3988 is OK
- 10.1038/s41592-021-01101-x is OK
- 10.1093/nar/gkz310 is OK
- 10.1093/nar/gky383 is OK
- 10.1093/nar/gkz192 is OK
- 10.1186/1471-2105-8-18 is OK
- 10.1104/pp.19.00386 is OK
- 10.1093/bioinformatics/btaa213 is OK
- 10.1101/2021.08.16.456562 is OK

MISSING DOIs

- None

INVALID DOIs

- None
clauswilke commented 3 years ago

@csoneson Just wondering if you need anything else from us.

csoneson commented 3 years ago

@clauswilke, @alexismhill3 - apologies, busy week. I have gone through the submission and I have only a few minor suggestions:

You can generate a new proof with @whedon generate pdf. Then, could you please:

I can then move forward with accepting the submission.

alexismhill3 commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

clauswilke commented 3 years ago

I believe we have done everything.

Release: https://github.com/wilkelab/Opfi/releases/tag/0.1.2

Zenodo DOI: https://doi.org/10.5281/zenodo.5601741

csoneson commented 3 years ago

@whedon set 0.1.2 as version

whedon commented 3 years ago

OK. 0.1.2 is the version.

csoneson commented 3 years ago

@whedon set 10.5281/zenodo.5601741 as archive

whedon commented 3 years ago

OK. 10.5281/zenodo.5601741 is the archive.