openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
725 stars 38 forks source link

[REVIEW]: Acanthophis: a comprehensive plant hologenomics pipeline #6062

Closed editorialbot closed 8 months ago

editorialbot commented 1 year ago

Submitting author: !--author-handle-->@kdm9<!--end-author-handle-- (Kevin Murray) Repository: https://github.com/kdm9/Acanthophis Branch with paper.md (empty if default branch): Version: 1.0.0 Editor: !--editor-->@marcosvital<!--end-editor-- Reviewers: @bricoletc, @gbouras13, @abhishektiwari Archive: 10.5281/zenodo.10795245

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/89901b387b0acb458c0efe6637fad366"><img src="https://joss.theoj.org/papers/89901b387b0acb458c0efe6637fad366/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/89901b387b0acb458c0efe6637fad366/status.svg)](https://joss.theoj.org/papers/89901b387b0acb458c0efe6637fad366)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@bricoletc & @gbouras13 & @abhishektiwari, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @marcosvital know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @bricoletc

📝 Checklist for @abhishektiwari

📝 Checklist for @gbouras13

editorialbot commented 1 year ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 1 year ago
Software report:

github.com/AlDanial/cloc v 1.88  T=0.08 s (818.1 files/s, 40579.1 lines/s)
--------------------------------------------------------------------------------
Language                      files          blank        comment           code
--------------------------------------------------------------------------------
YAML                             35             64            192            897
Markdown                          7            176              0            600
TeX                               1             28              0            484
Python                            7             71             58            303
Bourne Again Shell                4             23             26            113
Bourne Shell                      7              9             16             48
TOML                              1              1              0             11
Dockerfile                        1              1              0              4
--------------------------------------------------------------------------------
SUM:                             63            373            292           2460
--------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository
editorialbot commented 1 year ago

Wordcount for paper.md is 980

editorialbot commented 1 year ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1101/2021.08.07.455511 is OK
- 10.21105/joss.05627 is OK
- 10.1038/nmeth.3176 is OK
- 10.1093/gigascience/giab008 is OK
- 10.1093/bioinformatics/btw354 is OK
- 10.48550/arXiv.2309.15884 is OK
- 10.1101/2022.04.08.487684 is OK
- 10.1101/gr.210641.116 is OK
- 10.1093/bioinformatics/bts480 is OK
- 10.5281/ZENODO.4677629 is OK
- 10.1093/bioinformatics/btp352 is OK
- 10.1093/bioinformatics/btr509 is OK
- 10.1093/bioinformatics/bty191 is OK
- 10.1093/bioinformatics/btab705 is OK
- 10.1186/1756-0500-5-337 is OK
- 10.7717/peerj-cs.104 is OK
- 10.1038/ncomms11257 is OK
- 10.1371/journal.pbio.1002311 is OK
- 10.1371/journal.pcbi.1005727 is OK
- 10.1111/mec.15287 is OK
- 10.1186/s13059-016-0997-x is OK
- 10.1093/bioinformatics/btx699 is OK
- 10.1038/s41396-020-0665-8 is OK
- 10.1186/s13104-016-1900-2 is OK
- 10.1093/bioinformatics/btt468 is OK
- 10.6084/m9.figshare.13174337.v1 is OK
- 10.1186/s13059-019-1891-0 is OK
- 10.5281/ZENODO.7728364 is OK

MISSING DOIs

- None

INVALID DOIs

- None
editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

marcosvital commented 1 year ago

Dear @bricoletc, @gbouras13 and @abhishektiwari, thank you again for accepting review this submission for JOSS. The reviewing process is checklist based, and instructions were already posted above by the editorial bot - but let me know if you need any assistance, ok? Also, you can tag @kdm9 if you have specific questions about the manuscript.

@kdm9, you can tag your co-authors GitHub accounts if you want, so they will be able to follow this issue and answer to questions as well.

abhishektiwari commented 1 year ago

Review checklist for @abhishektiwari

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

bricoletc commented 12 months ago

Review checklist for @bricoletc

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

The documentation is very well-written and just missing one definition (see https://github.com/kdm9/Acanthophis/issues/9)

Software paper

I have provided my comments to be addressed on the software paper in https://github.com/kdm9/Acanthophis/issues/9

gbouras13 commented 12 months ago

Review checklist for @gbouras13

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

bricoletc commented 12 months ago

@kdm9, I've provided my main comments in three issues on your repo; I'll wait until they're addressed before checking in the corresponding items in my checklist

@marcosvital, I have two questions about JOSS: 1) why are there no 'Author contributions' sections in JOSS papers? This would help address checklist point 'contribution and authorship' 2) Can there be a 'Discussion' section in JOSS papers? I've provided two comments to that effect in https://github.com/kdm9/Acanthophis/issues/9 ; but is fine if not - just curious why they don't exist in JOSS papers

gbouras13 commented 11 months ago

Just as an FYI @marcosvital and @kdm9, I'll try and complete my initial review next week.

George

marcosvital commented 11 months ago

Hello everyone!

@gbouras13, any news on your review? Let me know if you need any kind of assistance, ok?

@kdm9, let us know when you are able to address @bricoletc comments.

@abhishektiwari, let us know if you make any advances on your review.

marcosvital commented 11 months ago

@marcosvital, I have two questions about JOSS: 1) why are there no 'Author contributions' sections in JOSS papers? This would help address checklist point 'contribution and authorship' 2) Can there be a 'Discussion' section in JOSS papers? I've provided two comments to that effect in kdm9/Acanthophis#9 ; but is fine if not - just curious why they don't exist in JOSS papers

Sorry about the delayed reply.

About a space for Author contributions, I'll take that to our editorial team so we can discuss that. Feels like a good suggestions to me.

About a Discussion section: it can be included - beside the mandatory sections from all papers, the authors can include specific sections that are needed for their work. We usually won't find a Discussion section in a JOSS paper because the journal is devoted to publishing the research software, but not the research findings.

kdm9 commented 11 months ago

many many thanks for the review @bricoletc

I was planning on addressing all reviews at once in the new year, if that's OK @marcosvital

marcosvital commented 11 months ago

I was planning on addressing all reviews at once in the new year, if that's OK @marcosvital

Yes, that's perfectly fine, @kdm9, thanks for letting me know.

gbouras13 commented 11 months ago

@marcosvital apologies things got away from me - I'll aim to have my review done by early in the new year so @kdm9 can go through them all together.

George

kdm9 commented 11 months ago

@gbouras13 perfect, and many thanks in advance.

kdm9 commented 10 months ago

@gbouras13 a friendly new years reminder about this :)

happy new year all!

gbouras13 commented 10 months ago

Hi @kdm9 ,

Thanks for the bugging, much needed for me :)

I've left a few issues in the repo. I tried to install Acanthophis to test it out, but I am struggling to, either from source or from pypi, so I cannot proceed on my review further.

I appreciate that it's been a few months since you submitted, so the documentation is probably out of date as it has taken me so long to review it.

But as an overall comment, I think Acanthophis would greatly benefit if you beefed up and made the README.md instructions in the tests repository part of the main documentation (or linked them on the main README.md). It would give users an example of how to set up Acanthophis and run a test dataset.

George

gbouras13 commented 10 months ago

Acanthophis is a plant hologenomics pipeline implemented in Snakemake. It makes use of Snakemake best practices (separately envs for each rule, clearly defined rules etc) and is implemented via pip, creating separate conda environments for each rule automatically. It also has customisable profiles and resources usage settings allowing for ‘embarrassingly’ parallel computation which I imagine will be very useful in such datasets.

Overall the paper is clear, outlines the statements of need and explains what acanthophis does. All the constituent programs seem reasonable in achieving what is desired within acanthophis.

Software

After a few teething installation bugs were resolved, acanthophis ran extremely fast on the test dataset. I am still having Mac installation bugs, but I consider this to be not particularly relevant, as almost any realistic use case for acanthophis will be on a server or HPC environment due to the data size on the inputs.

The config file and docs are well commented and makes it clear for the settings that can be configured. The manifest input file is also well explained.

I have a few comments to make & improvements to suggest.

These 2 things I think are necessary before I recommend acanthophis:

  1. I think the documentation is a bit sparse. Some things I think should be fleshed out: a. Some explanation of outputs (at least at the subdirectory level). While they are clearly named in the outputs, it would be good to explicitly make it clear what outputs are where to a user. b. Some explanation of how/where to install the taxonomic profiling databases – at a minimum links to the websites where they can be downloaded (e.g. Kraken, Kaiju, Centrifuge).
  2. Acanthophis should be available on bioconda as well as PyPI. Given it requires conda/mamba anyway, it makes sense it have it available there. If you need a hand @kdm9 let me know

I also have one other suggestion that I believe would improve Acanthophis quite a bit, but not required (as it would probably be a fair amount of work to implement) – I would like to hear your thoughts @kdm9.

One aspect thing that I didn’t like much about installing and using Acanthophis was the requirement to copy rules and snakefiles. This isn’t a fatal flaw but is annoying and may cause confusion for some users (and also means there will be lots of copies of these files floating around potentially).

A way of solving this problem is to give Acanthophis a CLI interface – the user could modify and specify the config file and manifest and feed those in as the input.

Two options I know of to achieve this (n.B. I have collaborated with the author of Snaketool and used it myself) are Snaketool (https://github.com/beardymcjohnface/Snaketool) and snk (https://github.com/Wytamma/snk).

This concludes my review. I've left a few boxes unticked for now.

George

abhishektiwari commented 10 months ago

@marcosvital Apologies for delay with review. I will revisit this manuscript and software and provide my feedback by end of this week.

kdm9 commented 10 months ago

Acanthophis is a plant hologenomics pipeline implemented in Snakemake. It makes use of Snakemake best practices (separately envs for each rule, clearly defined rules etc) and is implemented via pip, creating separate conda environments for each rule automatically. It also has customisable profiles and resources usage settings allowing for ‘embarrassingly’ parallel computation which I imagine will be very useful in such datasets.

Overall the paper is clear, outlines the statements of need and explains what acanthophis does. All the constituent programs seem reasonable in achieving what is desired within acanthophis.

Thanks George!

Software

After a few teething installation bugs were resolved, acanthophis ran extremely fast on the test dataset. I am still having Mac installation bugs, but I consider this to be not particularly relevant, as almost any realistic use case for acanthophis will be on a server or HPC environment due to the data size on the inputs.

I agree, but would still ideally like to solve those OSX bugs, as there is nothing obvious that should cause them to be unique to OSX, and so could effect other users. But I can't reproduce them on any system I've tried (including in fresh docker images), but that is all up-to-date linux. See https://github.com/kdm9/Acanthophis/issues/14#issuecomment-1899167421

The config file and docs are well commented and makes it clear for the settings that can be configured. The manifest input file is also well explained.

I have a few comments to make & improvements to suggest.

These 2 things I think are necessary before I recommend acanthophis:

  1. I think the documentation is a bit sparse. Some things I think should be fleshed out: a. Some explanation of outputs (at least at the subdirectory level). While they are clearly named in the outputs, it would be good to explicitly make it clear what outputs are where to a user.

Good suggestion, working on it, it will appear in documentation.md soon.

b. Some explanation of how/where to install the taxonomic profiling databases – at a minimum links to the websites where they can be downloaded (e.g. Kraken, Kaiju, Centrifuge).

Good point. It's actually fully configurable in the config file, but I'll add a note to this effect, and will provide quick example commands for common databases also in the config file.

2. Acanthophis should be available on bioconda as well as PyPI. Given it requires conda/mamba anyway, it makes sense it have it available there. If you need a hand @kdm9 let me know

I have a draft package, and was waiting before I cut a 1.0 release after any major peer review updates before shipping this in bioconda. But I will make sure this happens before the review is accepted. (draft package is here: https://github.com/kdm9/conda-pkgs/blob/main/pkgs/acanthophis/meta.yaml)

I also have one other suggestion that I believe would improve Acanthophis quite a bit, but not required (as it would probably be a fair amount of work to implement) – I would like to hear your thoughts @kdm9.

One aspect thing that I didn’t like much about installing and using Acanthophis was the requirement to copy rules and snakefiles. This isn’t a fatal flaw but is annoying and may cause confusion for some users (and also means there will be lots of copies of these files floating around potentially).

A way of solving this problem is to give Acanthophis a CLI interface – the user could modify and specify the config file and manifest and feed those in as the input.

Two options I know of to achieve this (n.B. I have collaborated with the author of Snaketool and used it myself) are Snaketool (https://github.com/beardymcjohnface/Snaketool) and snk (https://github.com/Wytamma/snk).

I've actually gone back and forth on this a few times. In a much earlier version, I did have Acanthophis set up with a CLI entry point, in a similar fashion to how snaketool operates (though before snaketool itself existed). But, it became rather inflexible and harder to maintain, and the core rule files were hidden from the end user. Feedback from several early Acanthophis users was that they preferred having a snakemake pipeline plonked in front of them, that they could run and modify as though they were the original authors.

I'll certainly keep Acanthophis operating in its current form, but I'll investigate setting it up so that once pip/conda installed, there's an additional snaketool entrypoint that allows running an unmodified version of the pipeline from $PREFIX/lib/python3/site-packages/acanthophis/... as a CLI, for users who preference a clean directory and a single CLI command over flexibility. This might be in a subsequent release though, and I've added a github issue to remind me (https://github.com/kdm9/Acanthophis/issues/18)

This concludes my review. I've left a few boxes unticked for now.

George

Thanks again for an extremely helpful and thorough review George, very much appreciated.

Best, Kevin

abhishektiwari commented 10 months ago

Installation

Installation instructions are inconsistent and create a lot of friction and confusion. Case in point installation using mamba (works using condaforge/miniforge3 docker image) vs. installation using pip (does not work). Some places installing from source. After bit of struggle I was able to install the package, got it running, and validate documented functionalities.

Documentation and Examples

Authors should consider including example input files and cleaning up documentation. Also, please provide a working Dockerfile. Currently included Dockerfile fails to build. I was unable to run example scripts included in tests folder.

Suggestions:

  1. In long run, it will better if authors can wrap commands exposed by Acanthophis using click as a CLI which will be a lot easier to distribute and install. And also allow to update the pipeline configuration from CLI (as suggested by other reviewer)

  2. Code quality and structure can be improved a lot. For instance, authors should consider a clear separation between templating and Python code otherwise project will be hard to maintain (see a file here).

  3. No automated testing to catch regression and bugs (tests is misleading it should be renamed as example). Suggest to add some minimal automated tests (ideally unit but even integration should suffice) but adding tests will require template and code decoupling as suggesting above.

kdm9 commented 10 months ago

Thanks for the review.

Installation

Installation instructions are inconsistent and create a lot of friction and confusion. Case in point installation using mamba (works using condaforge/miniforge3 docker image) vs. installation using pip (does not work). Some places installing from source. After bit of struggle I was able to install the package, got it running, and validate documented functionalities.

I have made it clearer in the docs that it is easiest if one uses conda/mamba to install the base environment, and pip to install acanthophis (until it enters bioconda, see discussion with George above).

Documentation and Examples

Authors should consider including example input files and cleaning up documentation.

Example input files are included. More specifically, I include a workflow to generate a test dataset of some GB under tests/. Because this is a simulation, there is no point in distributing the data itself, when any user who needs it can trivially create it with the provided workflow. This is in fact how the unit tests work.

Also, please provide a working Dockerfile. Currently included Dockerfile fails to build. I was unable to run example scripts included in tests folder.

Dockerfile is fixed, thanks. It used the recently obsoleted mambaforge, which I've replaced with miniforge3.

Suggestions:

1. In long run, it will better if authors can wrap commands exposed by Acanthophis using [click](https://click.palletsprojects.com/en/8.1.x/) as a CLI which will be a lot easier to distribute and install. And also allow to update the pipeline configuration from CLI (as suggested by other reviewer)

As said in the other review comments -- this will happen, but is not a priority as when surveyed, the early users specifically asked for the opposite. As tracked in https://github.com/kdm9/Acanthophis/issues/18 I will add this using snaketool in a future version.

2. Code quality and structure can be improved a lot. For instance, authors should consider a clear separation between templating and Python code otherwise project will be hard to maintain ([see a file here](https://github.com/kdm9/Acanthophis/blob/main/acanthophis/template/workflow/rules/base.rules)).

While there are a couple of utility function there that could in theory be refactored out, these functions mostly require access to workflow-global variables, and therefore are best placed within workflow code. I have separated them from the main rules of the workflow (note that there are no "real" rules in the base file). The templating/install code is completely separate (under acanthophis/__init__.py and acanthophis/cmd.py).

3. No automated testing to catch regression and bugs (`tests` is misleading it should be renamed as `example`). Suggest to add some minimal automated tests (ideally unit but even integration should suffice) but adding tests will require template and code decoupling as suggesting above.

Sorry, but that's simply untrue: the test directory is a simulated complete workflow that is run on every commit via github actions. Those tests simulate a population sequencing experiment and run the entire acanthophis workflow on those simulated samples, which directly tests the initial example configuration generated by acathophis init.

kdm9 commented 10 months ago

@marcosvital, I think I've satisfactorily addressed nearly all concerns of the reviewers (thanks to you all!)

Outstanding issues:

I'd strongly prefer that Acanthophis is accepted despite the first point above, as this will take time I am not able to immediately commit. The conda package should happen reasonably quickly.

There were some issues with installation on OSX, mainly poor interactions between conda & pip, and some dependencies not packaged for OSX. I now formally note in both the docs & paper that, while in theory a cross-platform package, there is little point to Acanthophis on OSX as a), it's for computations larger that what a Macbook can handle, and b), most of its dependencies are either not available in conda for OSX, or explicitly do not work on OSX. One can still install Acanthophis on OSX using pip, provided snakemake is installable, and could use the pipeline without conda, but this is not officially supported. Hopefully this is now clear to all users.

kdm9 commented 9 months ago

@editorialbot generate pdf

editorialbot commented 9 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

kdm9 commented 9 months ago

@editorialbot generate pdf

editorialbot commented 9 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

kdm9 commented 9 months ago

@editorialbot generate pdf

editorialbot commented 9 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

kdm9 commented 9 months ago

(for clarity to reviewers & editor: I'm doing some last edits, will let you all know here when it's good to go!)

kdm9 commented 9 months ago

@editorialbot generate pdf

editorialbot commented 9 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

marcosvital commented 9 months ago

Hi, @kdm9, let us know when you have finished editing. Also, if you want, you can direct us to specifics modifications that solved specific issues pointed out by the reviewers.

kdm9 commented 9 months ago

@marcosvital I'm just finalizing the edits now, based on final feedback from coauthors. I'll post here tagging you and all reviewers when the hopefully-final copy is ready (later today hopefully).

By pointing to specific modifications, do you mean a diffed paper version, or changes to the underlying code? Mainly the changes have been small but important bug fixes, and cleaning the paper or documentation of confusing, out of date, or otherwise wrong statements. I can provide more details if you need them, please just ask.

kdm9 commented 9 months ago

@editorialbot generate pdf

editorialbot commented 9 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

kdm9 commented 9 months ago

@editorialbot generate pdf

editorialbot commented 9 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

kdm9 commented 9 months ago

OK @marcosvital, I think the updated manuscript is finally ready, all authors have had a chance to read it and i've updated it with both their and all reviewers comments (the above discussion notwithstanding).

Pinging also @bricoletc @gbouras13 @abhishektiwari

bricoletc commented 9 months ago

Looks great to me!

marcosvital commented 8 months ago

Dear @bricoletc, @gbouras13 and @abhishektiwari: thank you for all the effort in reviewing this submission.

@kdm9 it seems that we are almost ready to start the acceptance process. We will need you to deposit the most up to date version on Zenodo (I saw that the last version there is 0.2), so let me know when this is done, ok?

marcosvital commented 8 months ago

@editorialbot check references

editorialbot commented 8 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1371/journal.pbio.1002311 is OK
- 10.1093/bioinformatics/bts480 is OK
- 10.1186/s13104-016-1900-2 is OK
- 10.1038/nmeth.3176 is OK
- 10.1186/s13059-016-0997-x is OK
- 10.1038/ncomms11257 is OK
- 10.1186/1756-0500-5-337 is OK
- 10.1093/bioinformatics/btr509 is OK
- 10.7717/peerj-cs.104 is OK
- 10.1371/journal.pcbi.1005727 is OK
- 10.1111/mec.15287 is OK
- 10.1101/2022.04.08.487684 is OK
- 10.1038/s41396-020-0665-8 is OK
- 10.1093/bioinformatics/btt468 is OK
- 10.1101/gr.210641.116 is OK
- 10.21105/joss.05627 is OK
- 10.1093/bioinformatics/btw354 is OK
- 10.1093/bioinformatics/bty191 is OK
- 10.1093/bioinformatics/btab705 is OK
- 10.1093/gigascience/giab008 is OK
- 10.1093/bioinformatics/btp352 is OK
- 10.5281/ZENODO.7728364 is OK
- 10.5281/ZENODO.4677629 is OK
- 10.1101/2021.08.07.455511 is OK
- 10.1186/s13059-019-1891-0 is OK
- 10.1093/bioinformatics/btx699 is OK
- 10.1093/bib/bbad375 is OK
- 10.1038/s41592-021-01101-x is OK

MISSING DOIs

- No DOI given, and none found for title: Aligning sequence reads, clone sequences and assem...

INVALID DOIs

- None
kdm9 commented 8 months ago

@marcosvital Just cut version 1.0.0, and Zenodo is here: https://zenodo.org/records/10795245

marcosvital commented 8 months ago

@editorialbot set v1.0.0 as version

editorialbot commented 8 months ago

Done! version is now v1.0.0

marcosvital commented 8 months ago

@editorialbot set 10.5281/zenodo.10795245 as archive

editorialbot commented 8 months ago

Done! archive is now 10.5281/zenodo.10795245