openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
725 stars 38 forks source link

[PRE REVIEW]: MetaGenePipe: An Automated, Portable Pipeline for Contig-based Functional and Taxonomic Analysis #4749

Closed editorialbot closed 2 years ago

editorialbot commented 2 years ago

Submitting author: !--author-handle-->@ParkvilleData<!--end-author-handle-- (Babak Shaban) Repository: https://github.com/ParkvilleData/MetaGenePipe/ Branch with paper.md (empty if default branch): Version: v.1.0.0 Editor: !--editor-->@jmschrei<!--end-editor-- Reviewers: @Ebedthan, @rjorton Managing EiC: Kristen Thyng

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/c9c52942084258507eeb1693b83153ba"><img src="https://joss.theoj.org/papers/c9c52942084258507eeb1693b83153ba/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/c9c52942084258507eeb1693b83153ba/status.svg)](https://joss.theoj.org/papers/c9c52942084258507eeb1693b83153ba)

Author instructions

Thanks for submitting your paper to JOSS @ParkvilleData. Currently, there isn't a JOSS editor assigned to your paper.

@ParkvilleData if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission (please start at the bottom of the list).

Editor instructions

The JOSS submission bot @editorialbot is here to help you find and assign reviewers and start the main review. To find out what @editorialbot can do for you type:

@editorialbot commands
editorialbot commented 2 years ago

Hello human, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 2 years ago
Software report:

github.com/AlDanial/cloc v 1.88  T=0.43 s (136.2 files/s, 178481.9 lines/s)
---------------------------------------------------------------------------------------
Language                             files          blank        comment           code
---------------------------------------------------------------------------------------
JSON                                     3             39              0          65992
Python                                  17            750           1028           2209
TeX                                      2             87              0           1082
Perl                                     5            184            230            564
Markdown                                 5            120              0            208
Jupyter Notebook                         4              0           2719            171
Windows Module Definition                1             20              0            126
reStructuredText                         6             82             60             86
YAML                                     3              5              5             63
Bourne Shell                             8             16             17             43
DOS Batch                                1              8              1             26
TOML                                     1              3              0             21
make                                     1              4              7              9
SVG                                      1              0              1              3
---------------------------------------------------------------------------------------
SUM:                                    58           1318           4068          70603
---------------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository
editorialbot commented 2 years ago

Wordcount for paper.md is 1579

editorialbot commented 2 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1038/nmeth.1923 is OK
- 10.1093/bioinformatics/btr507 is OK
- 10.1186/1471-2105-11-119 is OK
- 10.1038/nmeth.3176 is OK
- 10.1093/bioinformatics/btp698 is OK
- 10.1186/1471-2105-10-421 is OK
- 10.1093/nar/25.17.3389 is OK
- 10.1016/S0022-2836(05)80360-2 is OK
- 10.1093/bioinformatics/btz859 is OK
- 10.1093/nargab/lqaa026 is OK
- 10.1007/978-1-4939-9173-0_6 is OK
- 10.1093/bioinformatics/btv033 is OK
- 10.1093/bioinformatics/bts174 is OK
- 10.1093/nar/28.1.27 is OK
- 10.1002/pro.3715 is OK
- 10.1093/nar/gkaa970 is OK
- 10.5281/zenodo.5127899 is OK
- 10.1093/bioinformatics/btu170 is OK
- 10.1007/978-1-59745-535-0_4 is OK
- 10.1093/bioinformatics/btr174 is OK
- 10.1093/bioinformatics/btp352 is OK
- 10.1093/bioinformatics/btw354 is OK
- 10.1093/bioinformatics/btab184 is OK
- 10.1007/978-1-4939-3369-3_13 is OK
- 10.1101/2021.08.29.458094 is OK
- 10.1186/s12859-020-03585-4 is OK
- 10.1371/journal.pcbi.1008716 is OK
- 10.12688/f1000research.29032.1 is OK
- 10.1038/nbt.3820 is OK
- 10.1371/journal.pone.0177459 is OK
- 10.1093/bioinformatics/btab184 is OK
- 10.1093/bioinformatics/btz859 is OK
- 10.1093/nar/gky092 is OK
- 10.1038/s41592-021-01101-x is OK
- 10.1093/bioinformatics/bts174 is OK
- 10.1093/bioinformatics/btv033 is OK
- 10.1186/1471-2105-11-119 is OK
- 10.1038/s41598-020-67416-5 is OK
- 10.1038/s41598-020-67416-5 is OK

MISSING DOIs

- 10.1002/jmv.24839 may be a valid DOI for title: Detection of Toscana virus from an adult traveler returning to Australia with encephalitis
- 10.1371/journal.pone.0017288 may be a valid DOI for title: Fast identification and removal of sequence contamination from genomic and metagenomic datasets
- 10.3233/wor-2012-0507-2643 may be a valid DOI for title: The PhOCoe Model–ergonomic pattern mapping in participatory design processes
- 10.3233/wor-2012-0508-2656 may be a valid DOI for title: Conditions for the successful integration of Human and Organizational Factors (HOF) in the nuclear safety analysis
- 10.3233/wor-2012-1032-2661 may be a valid DOI for title: Analysis of organizational conditions for risk management: the case study of a petrochemical site

INVALID DOIs

- None
editorialbot commented 2 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

kthyng commented 2 years ago

Hi @jmschrei could you edit this submission?

kthyng commented 2 years ago

@editorialbot invite @jmschrei as editor

editorialbot commented 2 years ago

Invitation to edit this submission sent!

rbturnbull commented 2 years ago

Sorry we missed some of the DOIs. Hopefully they are all there now.

@editorialbot check references

jmschrei commented 2 years ago

Hi @rbturnbull @kthyng. I'm trying to assess whether this submission meets the requirement for substantial scholarly effort (https://joss.readthedocs.io/en/latest/review_criteria.html#substantial-scholarly-effort) and I'm having some difficulty, in part because I'm not as familiar with metagenomics as I am other fields of genomics. After reading the manuscript, the repo, and the associated documentation, I still don't know precisely what it is that this software does. I understand that there are pipelines that process data from metagenomic samples but, for example, I'm not sure what the statement "create an accurate taxonomic and functional characterization of the prokaryotic fraction of sequenced microbiomes." What is a "functional characterization" in this context? What is output as part of the "taxonomic classification"? What specific problems does this solve?

If you can revise the repo README and documentation to be more specific as to the high-level objectives of the package, as well as the specific steps one would use to achieve those goals, it'd be a lot easier for me to assess the scholarly effort. Keep in mind that the documentation should not be for a subject matter expert, but a user who is quickly trying to assess if this software will solve their problem. I'm not doubting the scientific merit of the software, but I don't fully understand from the current documentation what exactly it does or when one would use it.

jmschrei commented 2 years ago

And also, just to clarify, are any of the workflow steps new ones that you're proposing, or are all of them just wrappers for commonly used code that others have developed? The latter is not a problem, but I wanted to make sure I understood the contribution correctly.

rbturnbull commented 2 years ago

Thanks for the messages @jmschrei. We will respond in the next couple of days when we're back on deck at work.

kthyng commented 2 years ago

@jmschrei Thanks for your comments. Given this, a scope query is appropriate to ping the editorial board and gather a consensus about the fit of this submission for JOSS. I'll start that process.

@rbturnbull It'd be helpful if you can address @jmschrei's comments in the meantime so we can consider that information alongside your submission. This process will take 1-2 weeks.

kthyng commented 2 years ago

@editorialbot query scope

editorialbot commented 2 years ago

Submission flagged for editorial review.

ParkvilleData commented 2 years ago

Hi @jmschrei, I have updated the README to include an output tree of all output files, descriptions of the output files and extra paragraphs detailing the usefulness of taxonomic profiling. There is also a table that represents how the profiles are represented in the outputs.

The workflow is a collection of "wrappers" that encompass bioinformatics standard software in the field of metagenomics. As stated in the paper, there are similar examples written in NextFlow (Nf-core and Muffin) and Snakemake (Atlas) but very limited options in WDL. The advantages of the workflow include it's versatility (in terms of substituting alignment databases) which can tailor the workflow to the researchers question of interest and the setup script can facilitate this by downloading required blast databases. WDL is a human readable workflow language and as such we have found has a much lower learning curve for lab based biologists, has a wealth of documentation, tutorials and examples, and doesn't need background knowledge in a programming language such as python.

Previous iterations of the workflow has been used by myself as a method in published papers to discover novel RNA viruses. The current iteration is the "polished" version which has been applied to the field of metagenomics.

danielskatz commented 2 years ago

@editorialbot check references

(just following up from the try by @rbturnbull a few days ago - editorialbot commands need to be the first thing in a comment)

editorialbot commented 2 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1002/jmv.24839 is OK
- 10.1038/nmeth.1923 is OK
- 10.1093/bioinformatics/btr507 is OK
- 10.1371/journal.pone.0017288 is OK
- 10.1186/1471-2105-11-119 is OK
- 10.1038/nmeth.3176 is OK
- 10.1093/bioinformatics/btp698 is OK
- 10.1186/1471-2105-10-421 is OK
- 10.1093/nar/25.17.3389 is OK
- 10.3233/WOR-2012-0507-2643 is OK
- 10.3233/wor-2012-0508-2656 is OK
- 10.3233/wor-2012-1032-2661 is OK
- 10.1016/S0022-2836(05)80360-2 is OK
- 10.1093/bioinformatics/btz859 is OK
- 10.1093/nargab/lqaa026 is OK
- 10.1007/978-1-4939-9173-0_6 is OK
- 10.1093/bioinformatics/btv033 is OK
- 10.1093/bioinformatics/bts174 is OK
- 10.1093/nar/28.1.27 is OK
- 10.1002/pro.3715 is OK
- 10.1093/nar/gkaa970 is OK
- 10.5281/zenodo.5127899 is OK
- 10.1093/bioinformatics/btu170 is OK
- 10.1007/978-1-59745-535-0_4 is OK
- 10.1093/bioinformatics/btr174 is OK
- 10.1093/bioinformatics/btp352 is OK
- 10.1093/bioinformatics/btw354 is OK
- 10.1093/bioinformatics/btab184 is OK
- 10.1007/978-1-4939-3369-3_13 is OK
- 10.1101/2021.08.29.458094 is OK
- 10.1186/s12859-020-03585-4 is OK
- 10.1371/journal.pcbi.1008716 is OK
- 10.12688/f1000research.29032.1 is OK
- 10.1038/nbt.3820 is OK
- 10.1371/journal.pone.0177459 is OK
- 10.1093/bioinformatics/btab184 is OK
- 10.1093/bioinformatics/btz859 is OK
- 10.1093/nar/gky092 is OK
- 10.1038/s41592-021-01101-x is OK
- 10.1093/bioinformatics/bts174 is OK
- 10.1093/bioinformatics/btv033 is OK
- 10.1186/1471-2105-11-119 is OK
- 10.1038/s41598-020-67416-5 is OK
- 10.1038/s41598-020-67416-5 is OK

MISSING DOIs

- Errored finding suggestions for "An atypical Parvovirus drives chronic tubulointers...", please try later
- Errored finding suggestions for "FASTQC. A quality control tool for high throughput...", please try later
- Errored finding suggestions for "Kraken taxonomic sequence classification system: O...", please try later
- Errored finding suggestions for "Full-stack genomics pipelining with GATK4 + WDL + ...", please try later

INVALID DOIs

- None
ParkvilleData commented 2 years ago

Hi, I have updated Missing DOIs. Fastqc doesn't seem to have one.

@editorialbot check references

danielskatz commented 2 years ago

@editorialbot check references

editorialbot commands need to be the first thing in a comment

editorialbot commented 2 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1002/jmv.24839 is OK
- 10.1016/j.cell.2018.08.013 is OK
- 10.1038/nmeth.1923 is OK
- 10.1093/bioinformatics/btr507 is OK
- 10.1371/journal.pone.0017288 is OK
- 10.1186/1471-2105-11-119 is OK
- 10.1038/nmeth.3176 is OK
- 10.1093/bioinformatics/btp698 is OK
- 10.1186/1471-2105-10-421 is OK
- 10.1093/nar/25.17.3389 is OK
- 10.3233/WOR-2012-0507-2643 is OK
- 10.3233/wor-2012-0508-2656 is OK
- 10.3233/wor-2012-1032-2661 is OK
- 10.1016/S0022-2836(05)80360-2 is OK
- 10.1093/bioinformatics/btz859 is OK
- 10.1093/nargab/lqaa026 is OK
- 10.1007/978-1-4939-9173-0_6 is OK
- 10.1093/bioinformatics/btv033 is OK
- 10.1093/bioinformatics/bts174 is OK
- 10.1093/nar/28.1.27 is OK
- 10.1002/pro.3715 is OK
- 10.1093/nar/gkaa970 is OK
- 10.1186/gb-2014-15-3-r46 is OK
- 10.5281/zenodo.5127899 is OK
- 10.1093/bioinformatics/btu170 is OK
- 10.1007/978-1-59745-535-0_4 is OK
- 10.1093/bioinformatics/btr174 is OK
- 10.1093/bioinformatics/btp352 is OK
- 10.1093/bioinformatics/btw354 is OK
- 10.1093/bioinformatics/btab184 is OK
- 10.1007/978-1-4939-3369-3_13 is OK
- 10.1101/2021.08.29.458094 is OK
- 10.1186/s12859-020-03585-4 is OK
- 10.1371/journal.pcbi.1008716 is OK
- 10.12688/f1000research.29032.1 is OK
- 10.1038/nbt.3820 is OK
- 10.1371/journal.pone.0177459 is OK
- 10.1093/bioinformatics/btab184 is OK
- 10.7490/f1000research.1114634.1 is OK
- 10.1093/bioinformatics/btz859 is OK
- 10.1093/nar/gky092 is OK
- 10.1038/s41592-021-01101-x is OK
- 10.1093/bioinformatics/bts174 is OK
- 10.1093/bioinformatics/btv033 is OK
- 10.1186/1471-2105-11-119 is OK
- 10.1038/s41598-020-67416-5 is OK
- 10.1038/s41598-020-67416-5 is OK

MISSING DOIs

- None

INVALID DOIs

- None
arfon commented 2 years ago

:wave: @jmschrei – just checking if you're able to edit this submission for us?

Kevin-Mattheus-Moerman commented 2 years ago

@editorialbot invite @jmschrei as editor

editorialbot commented 2 years ago

Invitation to edit this submission sent!

jmschrei commented 2 years ago

@editorialbot assign me as editor

editorialbot commented 2 years ago

Assigned! @jmschrei is now the editor

jmschrei commented 2 years ago

Sorry for the delay in response. I've had a series of back-to-back travel engagements and this slipped my mind.

jmschrei commented 2 years ago

Hi @nearinj and @coughls, would either of you be able to review this submission?

jmschrei commented 2 years ago

Hi @Ebedthan and @vinisalazar, would either of you be able to review this submission?

Ebedthan commented 2 years ago

Hi @jmschrei , Yes for sure :)

jmschrei commented 2 years ago

@editorialbot assign @Ebedthan as reviewer

editorialbot commented 2 years ago

I'm sorry human, I don't understand that. You can see what commands I support by typing:

@editorialbot commands

jmschrei commented 2 years ago

@editorialbot add @Ebedthan as reviewer

editorialbot commented 2 years ago

@Ebedthan added to the reviewers list!

nearinj commented 2 years ago

@jmschrei I would be happy to review this manuscript although I won't have it back until at the earliest the first week in Novemeber due to prior commitements.

vinisalazar commented 2 years ago

Hi @jmschrei, thank you for the invite but I cannot review this manuscript due to a conflict of interest.

Best, V

jmschrei commented 2 years ago

No problem @vinisalazar, thanks for getting back to me.

@nearinj I'm going to try to find another reviewer just to speed this up but if I can't find anyone else I might ask you again. Thanks!

jmschrei commented 2 years ago

@rjorton or @MmasterT, would either of you be able to serve as a reviewer for this?

vguide commented 2 years ago

@jmschrei, I could but I'm away this and next on holiday - back 24th Oct

rjorton commented 2 years ago

@jmschrei, I could but I'm away this and next on holiday - back 24th Oct (sorry I was logged in with another account vguide when I made the first comment)

jmschrei commented 2 years ago

@ParkvilleData is waiting until the 24th alright for you?

rbturnbull commented 2 years ago

Hi @jmschrei and @rjorton - waiting until the 24th of October is fine for us. Thanks! (on behalf of @ParkvilleData)

jmschrei commented 2 years ago

Great, thanks.

jmschrei commented 2 years ago

@editorialbot add @rjorton as reviewer

editorialbot commented 2 years ago

@rjorton added to the reviewers list!

jmschrei commented 2 years ago

@editorialbot start review

editorialbot commented 2 years ago

OK, I've started the review over in https://github.com/openjournals/joss-reviews/issues/4851.