openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
725 stars 38 forks source link

[REVIEW]: MetaGenePipe: An Automated, Portable Pipeline for Contig-based Functional and Taxonomic Analysis #4851

Closed editorialbot closed 1 year ago

editorialbot commented 2 years ago

Submitting author: !--author-handle-->@ParkvilleData<!--end-author-handle-- (Babak Shaban) Repository: https://github.com/ParkvilleData/MetaGenePipe/ Branch with paper.md (empty if default branch): Version: 1.1.5 Editor: !--editor-->@jmschrei<!--end-editor-- Reviewers: @Ebedthan, @rjorton Archive: 10.26188/22032425

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/c9c52942084258507eeb1693b83153ba"><img src="https://joss.theoj.org/papers/c9c52942084258507eeb1693b83153ba/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/c9c52942084258507eeb1693b83153ba/status.svg)](https://joss.theoj.org/papers/c9c52942084258507eeb1693b83153ba)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@Ebedthan & @rjorton, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @jmschrei know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @Ebedthan

editorialbot commented 2 years ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 2 years ago
Software report:

github.com/AlDanial/cloc v 1.88  T=0.30 s (195.3 files/s, 256489.6 lines/s)
---------------------------------------------------------------------------------------
Language                             files          blank        comment           code
---------------------------------------------------------------------------------------
JSON                                     3             39              0          65992
Python                                  17            750           1028           2209
TeX                                      2             87              0           1090
Perl                                     5            184            230            564
Markdown                                 5            134              0            314
Jupyter Notebook                         4              0           2719            171
Windows Module Definition                1             20              0            126
reStructuredText                         6             86             60            126
YAML                                     3              5              5             63
Bourne Shell                             8             16             17             43
DOS Batch                                1              8              1             26
TOML                                     1              3              0             21
make                                     1              4              7              9
SVG                                      1              0              1              3
---------------------------------------------------------------------------------------
SUM:                                    58           1336           4068          70757
---------------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository
editorialbot commented 2 years ago

Wordcount for paper.md is 1579

editorialbot commented 2 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1002/jmv.24839 is OK
- 10.1016/j.cell.2018.08.013 is OK
- 10.1038/nmeth.1923 is OK
- 10.1093/bioinformatics/btr507 is OK
- 10.1371/journal.pone.0017288 is OK
- 10.1186/1471-2105-11-119 is OK
- 10.1038/nmeth.3176 is OK
- 10.1093/bioinformatics/btp698 is OK
- 10.1186/1471-2105-10-421 is OK
- 10.1093/nar/25.17.3389 is OK
- 10.3233/WOR-2012-0507-2643 is OK
- 10.3233/wor-2012-0508-2656 is OK
- 10.3233/wor-2012-1032-2661 is OK
- 10.1016/S0022-2836(05)80360-2 is OK
- 10.1093/bioinformatics/btz859 is OK
- 10.1093/nargab/lqaa026 is OK
- 10.1007/978-1-4939-9173-0_6 is OK
- 10.1093/bioinformatics/btv033 is OK
- 10.1093/bioinformatics/bts174 is OK
- 10.1093/nar/28.1.27 is OK
- 10.1002/pro.3715 is OK
- 10.1093/nar/gkaa970 is OK
- 10.1186/gb-2014-15-3-r46 is OK
- 10.5281/zenodo.5127899 is OK
- 10.1093/bioinformatics/btu170 is OK
- 10.1007/978-1-59745-535-0_4 is OK
- 10.1093/bioinformatics/btr174 is OK
- 10.1093/bioinformatics/btp352 is OK
- 10.1093/bioinformatics/btw354 is OK
- 10.1093/bioinformatics/btab184 is OK
- 10.1007/978-1-4939-3369-3_13 is OK
- 10.1101/2021.08.29.458094 is OK
- 10.1186/s12859-020-03585-4 is OK
- 10.1371/journal.pcbi.1008716 is OK
- 10.12688/f1000research.29032.1 is OK
- 10.1038/nbt.3820 is OK
- 10.1371/journal.pone.0177459 is OK
- 10.1093/bioinformatics/btab184 is OK
- 10.7490/f1000research.1114634.1 is OK
- 10.1093/bioinformatics/btz859 is OK
- 10.1093/nar/gky092 is OK
- 10.1038/s41592-021-01101-x is OK
- 10.1093/bioinformatics/bts174 is OK
- 10.1093/bioinformatics/btv033 is OK
- 10.1186/1471-2105-11-119 is OK
- 10.1038/s41598-020-67416-5 is OK
- 10.1038/s41598-020-67416-5 is OK

MISSING DOIs

- None

INVALID DOIs

- None
jmschrei commented 2 years ago

Howdy @Ebedthan and @rjorton!

Thanks for agreeing to review this submission.

The process for conducting a review is outlined above. Please run the command shown above to have @editorialbot generate your checklist, which will give a step-by-step process for conducting your review. Please check the boxes during your review to keep track, as well as make comments in this thread or open issues in the repository itself to point out issues you encounter. Keep in mind that our aim is to improve the submission to the point where it is of high enough quality to be accepted, rather than to provide a yes/no decision, and so having a conversation with the authors is encouraged rather than providing a single review post at the end of the process.

Here are the review guidelines: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html And here is a checklist, similar to above: https://joss.readthedocs.io/en/latest/review_checklist.html

Please let me know if you encounter any issues or need any help during the review process, and thanks for contributing your time to JOSS and the open source community!

editorialbot commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Ebedthan commented 2 years ago

Review checklist for @Ebedthan

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Ebedthan commented 2 years ago

Summary and overall critics We thank the authors and @ParkvilleData for the proposed article which is a valuable study proposing a useful tool for bioinformaticians and computational biologists. I have particularly loved the idea and implementation that take into account the possibility to run it on servers. I have also appreciated the software website. However, this article and the associated code repository suffer from some correctable shortcomings.

Major points

Minor points

mariadelmarq commented 2 years ago

@Ebedthan thank you for your helpful suggestions. We have incorporated them all into the paper. Would you please be able to check the shortened statement of need to make sure it's clear?

rbturnbull commented 2 years ago

@Ebedthan We've also addressed the other issues you've raised

rbturnbull commented 2 years ago

@editorialbot generate pdf

editorialbot commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

rjorton commented 2 years ago

It's a nicely written paper, below are some comments on the paper, I'm going to download and try the tool next. Minor points on the paper mostly centring around the "output" of the tool which isn't really described in much detail anywhere either in the paper or on the GitHub repo. I think some minor expansion on the main taxonomic output files of the tool would be useful. Specific points:

13 - "output that is useful in its default format" - what is the default format and why is it useful

31-33 - clarify - so MetaGenePipe does not do any nucleotide to nucleotide comparison then - and would potentially miss non-coding sequence classification? Later on (line 74) the BLAST NT db is mentioned

35 - Think it would be good to give a one line explanation on each of these - what are Kegg Brite and KoalaFarm profiles (this could come under explanation of output - see later)

49 - do you have any guidance on how it could be applied to viruses?

75 - the output of MetaGenePipe is not really describe in much detail anywhere. Given BLAST output is quite structured - what parsing is happening here? How does this result in more easily searchable data? Are the "no hits" in a separate file?

79-87 - for quantifying relative abundance - how can this be done here? Is a file created that has a list of contigs, their taxonomic assignment and the number of reads mapping? i.e. is the output here the relative abundance of the contig alone - or can it easily be used to calculate abundance at taxon levels

88-96 - so what is output at this step? A contig could have multiple ORFs - are each one evaluated separately - can you link back to the contig - is it just Kegg/Koala IDs or are their taxon name assignments also included - are these specific taxons (i.e. species/strain) or are they broader taxonomic paths also included

Table 1 - is the ordering of the table correct - the map reads step is near the end - but the description places it earlier

Github Repo - the main output is the Taxon output which is described very briefly on the GitHub repo as Level 1/2/3 Kegg Brite Hierarchical count (not mention of Koala) - can the specific output format be described in more detail - perhaps and example output file(s) from a given metagenomics set be provided

jmschrei commented 2 years ago

Hi @rjorton, how are your attempts to download and use the tool? Are the authors responding to the issues you pointed out above?

mariadelmarq commented 2 years ago

Hi @jmschrei and @rjorton: I apologize for the delay in responding to these very helpful questions and comments. Upon investigating the answers we have found a few bugs in the workflow, and I am in the process of fixing them. It will take a bit longer than usual given that the first author sadly passed away a few weeks ago. I really appreciate your patience as we work through this!

Kevin-Mattheus-Moerman commented 1 year ago

@mariadelmarq I am the AEiC on this track. I just want to say our condolences on the loss of your colleague. Please let us know how much time you need (or indeed if pursuing publication with JOSS is still possible). I'll pause this submission for now but we can resume whenever you are ready.

mariadelmarq commented 1 year ago

@Kevin-Mattheus-Moerman @jmschrei Thank you so much for your understanding. We're hoping to reply addressing all of @rjorton's comments before the end of this week.

mariadelmarq commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

mariadelmarq commented 1 year ago

@rjorton Thank you once again for your extremely helpful comments and questions! Responses to comments are below. Please let us know if you run into any problems downloading and testing the workflow. Happy holidays to you all!

1. 13 - "output that is useful in its default format" - what is the default format and why is it useful

We have rephrased this part of the summary. We did not consider it advisable to give lots of details on output formats in the summary but have included a whole section on this in the main document and there is an output tree available in the repository readme, which we now refer to in the main text.

2. 31-33 - clarify - so MetaGenePipe does not do any nucleotide to nucleotide comparison then - and would potentially miss non-coding sequence classification? Later on (line 74) the BLAST NT db is mentioned

The BLAST step in the Assembly workflow is indeed at the nucleotide level, but these blast outputs are not used for the final taxonomic or functional classifications, which is situated within the gene Prediction subworkflow. For that element, indeed only protein-coding sequences are used. We have rephrased the summary to clarify that BLASTn happens at a different stage in the workflow.

3. 35 - Think it would be good to give a one line explanation on each of these - what are Kegg Brite and KoalaFarm profiles (this could come under explanation of output - see later)

The summary has been compacted down but we have addressed this in main paper.

4. 49 - do you have any guidance on how it could be applied to viruses?

This is a complicated question as viral coding sequence prediction depends on the host and whether they have an RNA or DNA genome. We now cite this recent comparative analysis of techniques that helps users evaluate what suits their needs best: https://www.biorxiv.org/content/10.1101/2021.12.11.472104v1

5. 75 - the output of MetaGenePipe is not really describe in much detail anywhere. Given BLAST output is quite structured - what parsing is happening here? How does this result in more easily searchable data? Are the "no hits" in a separate file?

We have included a new section describing the main outputs and an exhaustive listing in the docs.

6. 79-87 - for quantifying relative abundance - how can this be done here? Is a file created that has a list of contigs, their taxonomic assignment and the number of reads mapping? i.e. is the output here the relative abundance of the contig alone - or can it easily be used to calculate abundance at taxon levels

The mapping results in SAM/BAM mapping files for each pair of read files, which can be used for downstream metagenome binning applications, or users could use these to obtain a list of contigs with read depth metrics by running the jgi_summarize_bam_contig_depths tool.

7. 88-96 - so what is output at this step? A contig could have multiple ORFs - are each one evaluated separately - can you link back to the contig - is it just Kegg/Koala IDs or are their taxon name assignments also included - are these specific taxons (i.e. species/strain) or are they broader taxonomic paths also included

The output tables are simply counts of ORFs matching to KEGG IDs (for the functional table) or taxa (for the OTU table). The new output section of the manuscript now specifies this. The taxonomic table does include the broader taxonomic paths.

8. Table 1 - is the ordering of the table correct - the map reads step is near the end - but the description places it earlier

Fixed

9. Github Repo - the main output is the Taxon output which is described very briefly on the GitHub repo as Level 1/2/3 Kegg Brite Hierarchical count (not mention of Koala) - can the specific output format be described in more detail - perhaps and example output file(s) from a given metagenomics set be provided

We have added substantially more information about the outputs in the manuscript and the documentation. See this link for the documentation: https://parkvilledata.github.io/MetaGenePipe/workflow.html#output

rbturnbull commented 1 year ago

Hi @Kevin-Mattheus-Moerman Should we take off the 'paused' label now since @mariadelmarq has responded to @rjorton 's points and updated the paper? I think everything is ready to go now.

Kevin-Mattheus-Moerman commented 1 year ago

@rbturnbull yes the editor @jmschrei can help do that if needed.

jmschrei commented 1 year ago

@Ebedthan do you still have concerns about the paper? If not, would you mind checking the remaining boxes on your review? Thank you!

jmschrei commented 1 year ago

@rjorton would you mind generating and filling out the review checklist? Instructions are in the first post.

Ebedthan commented 1 year ago

Hi @jmschrei, I'm done and everything is good for me. Thank you.

jmschrei commented 1 year ago

@editorialbot check references

editorialbot commented 1 year ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1002/jmv.24839 is OK
- 10.1016/j.cell.2018.08.013 is OK
- 10.1038/nmeth.1923 is OK
- 10.1093/bioinformatics/btr507 is OK
- 10.1371/journal.pone.0017288 is OK
- 10.1186/1471-2105-11-119 is OK
- 10.1038/nmeth.3176 is OK
- 10.1093/bioinformatics/btp698 is OK
- 10.1186/1471-2105-10-421 is OK
- 10.1093/nar/25.17.3389 is OK
- 10.3233/WOR-2012-0507-2643 is OK
- 10.3233/wor-2012-0508-2656 is OK
- 10.3233/wor-2012-1032-2661 is OK
- 10.1016/S0022-2836(05)80360-2 is OK
- 10.1093/bioinformatics/btz859 is OK
- 10.1093/nargab/lqaa026 is OK
- 10.1007/978-1-4939-9173-0_6 is OK
- 10.1093/bioinformatics/btv033 is OK
- 10.1093/bioinformatics/bts174 is OK
- 10.1093/nar/28.1.27 is OK
- 10.1002/pro.3715 is OK
- 10.1093/nar/gkaa970 is OK
- 10.1186/gb-2014-15-3-r46 is OK
- 10.5281/zenodo.5127899 is OK
- 10.1093/bioinformatics/btu170 is OK
- 10.1007/978-1-59745-535-0_4 is OK
- 10.1093/bioinformatics/btr174 is OK
- 10.1093/bioinformatics/btp352 is OK
- 10.1093/bioinformatics/btw354 is OK
- 10.1093/bioinformatics/btab184 is OK
- 10.1007/978-1-4939-3369-3_13 is OK
- 10.1093/nargab/lqac007 is OK
- 10.1186/s12859-020-03585-4 is OK
- 10.1371/journal.pcbi.1008716 is OK
- 10.12688/f1000research.29032.1 is OK
- 10.1038/nbt.3820 is OK
- 10.1371/journal.pone.0177459 is OK
- 10.1093/bioinformatics/btab184 is OK
- 10.7490/f1000research.1114634.1 is OK
- 10.1093/bioinformatics/btz859 is OK
- 10.1093/nar/gky092 is OK
- 10.1038/s41592-021-01101-x is OK
- 10.1093/bioinformatics/bts174 is OK
- 10.1186/1471-2105-11-119 is OK
- 10.1038/s41598-020-67416-5 is OK
- 10.1038/s41598-020-67416-5 is OK
- 10.7717/peerj.7359 is OK
- 10.1101/2021.12.11.472104 is OK
- 10.1093/bioinformatics/btv383 is OK

MISSING DOIs

- None

INVALID DOIs

- None
jmschrei commented 1 year ago

@editorialbot generate pdf

editorialbot commented 1 year ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

jmschrei commented 1 year ago

@mariadelmarq can you please provide a version for the software corresponding to this submission, and a DOI for an archive containing the submission and code, e.g. on Zenodo?

rbturnbull commented 1 year ago

Hi @jmschrei - we've released a version of the software and added it to Figshare: https://melbourne.figshare.com/articles/software/MetaGenePipe_An_Automated_Portable_Pipeline_for_Contig-based_Functional_and_Taxonomic_Analysis/22032425

Here is the DOI: 10.26188/22032425.v1 (https://doi.org/10.26188/22032425.v1)

I also made a small change to the paper, rounding the report of the time and memory usage in the 'Resource usage and infrastructure requirements' section so there weren't so many irrelevant significant figures.

jmschrei commented 1 year ago

Great, thanks.

jmschrei commented 1 year ago

@editorialbot set 10.26188/22032425 as archive

editorialbot commented 1 year ago

Done! Archive is now 10.26188/22032425

jmschrei commented 1 year ago

@editorialbot set 1.1.5 as version

editorialbot commented 1 year ago

Done! version is now 1.1.5

jmschrei commented 1 year ago

@editorialbot recommend-accept

editorialbot commented 1 year ago
Attempting dry run of processing paper acceptance...
editorialbot commented 1 year ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1002/jmv.24839 is OK
- 10.1016/j.cell.2018.08.013 is OK
- 10.1038/nmeth.1923 is OK
- 10.1093/bioinformatics/btr507 is OK
- 10.1371/journal.pone.0017288 is OK
- 10.1186/1471-2105-11-119 is OK
- 10.1038/nmeth.3176 is OK
- 10.1093/bioinformatics/btp698 is OK
- 10.1186/1471-2105-10-421 is OK
- 10.1093/nar/25.17.3389 is OK
- 10.3233/WOR-2012-0507-2643 is OK
- 10.3233/wor-2012-0508-2656 is OK
- 10.3233/wor-2012-1032-2661 is OK
- 10.1016/S0022-2836(05)80360-2 is OK
- 10.1093/bioinformatics/btz859 is OK
- 10.1093/nargab/lqaa026 is OK
- 10.1007/978-1-4939-9173-0_6 is OK
- 10.1093/bioinformatics/btv033 is OK
- 10.1093/bioinformatics/bts174 is OK
- 10.1093/nar/28.1.27 is OK
- 10.1002/pro.3715 is OK
- 10.1093/nar/gkaa970 is OK
- 10.1186/gb-2014-15-3-r46 is OK
- 10.5281/zenodo.5127899 is OK
- 10.1093/bioinformatics/btu170 is OK
- 10.1007/978-1-59745-535-0_4 is OK
- 10.1093/bioinformatics/btr174 is OK
- 10.1093/bioinformatics/btp352 is OK
- 10.1093/bioinformatics/btw354 is OK
- 10.1093/bioinformatics/btab184 is OK
- 10.1007/978-1-4939-3369-3_13 is OK
- 10.1093/nargab/lqac007 is OK
- 10.1186/s12859-020-03585-4 is OK
- 10.1371/journal.pcbi.1008716 is OK
- 10.12688/f1000research.29032.1 is OK
- 10.1038/nbt.3820 is OK
- 10.1371/journal.pone.0177459 is OK
- 10.1093/bioinformatics/btab184 is OK
- 10.7490/f1000research.1114634.1 is OK
- 10.1093/bioinformatics/btz859 is OK
- 10.1093/nar/gky092 is OK
- 10.1038/s41592-021-01101-x is OK
- 10.1093/bioinformatics/bts174 is OK
- 10.1186/1471-2105-11-119 is OK
- 10.1038/s41598-020-67416-5 is OK
- 10.1038/s41598-020-67416-5 is OK
- 10.7717/peerj.7359 is OK
- 10.1101/2021.12.11.472104 is OK
- 10.1093/bioinformatics/btv383 is OK

MISSING DOIs

- None

INVALID DOIs

- None
editorialbot commented 1 year ago

:wave: @openjournals/bcm-eics, this paper is ready to be accepted and published.

Check final proof :point_right::page_facing_up: Download article

If the paper PDF and the deposit XML files look good in https://github.com/openjournals/joss-papers/pull/3949, then you can now move forward with accepting the submission by compiling again with the command @editorialbot accept

rbturnbull commented 1 year ago

@jmschrei The final proof looks good to me. Thanks!

mariadelmarq commented 1 year ago

@jmschrei : not sure if you need me to also confirm as corresponding author, but all looks good. Thank you so much for your patience!

editorialbot commented 1 year ago

I'm sorry @mariadelmarq, I'm afraid I can't do that. That's something only eics are allowed to do.

jmschrei commented 1 year ago

Sorry for the confusion. The next step is for the EiC to review everything, potentially ask for minor administrative changes, and then accept the paper.

arfon commented 1 year ago

Sorry for the delay here folks. Accepting and publishing now...

arfon commented 1 year ago

@editorialbot accept

editorialbot commented 1 year ago
Doing it live! Attempting automated processing of paper acceptance...
editorialbot commented 1 year ago

🐦🐦🐦 👉 Tweet for this paper 👈 🐦🐦🐦

editorialbot commented 1 year ago

🐘🐘🐘 👉 Toot for this paper 👈 🐘🐘🐘

editorialbot commented 1 year ago

🚨🚨🚨 THIS IS NOT A DRILL, YOU HAVE JUST ACCEPTED A PAPER INTO JOSS! 🚨🚨🚨

Here's what you must now do:

  1. Check final PDF and Crossref metadata that was deposited :point_right: https://github.com/openjournals/joss-papers/pull/3988
  2. Wait a couple of minutes, then verify that the paper DOI resolves https://doi.org/10.21105/joss.04851
  3. If everything looks good, then close this review issue.
  4. Party like you just published a paper! 🎉🌈🦄💃👻🤘

Any issues? Notify your editorial technical team...