openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
708 stars 37 forks source link

[REVIEW]: Viash: A meta-framework for building reusable workflow modules #6089

Closed editorialbot closed 8 months ago

editorialbot commented 10 months ago

Submitting author: !--author-handle-->@rcannood<!--end-author-handle-- (Robrecht Cannoodt) Repository: https://github.com/viash-io/viash Branch with paper.md (empty if default branch): paper Version: 0.8.4 Editor: !--editor-->@danielskatz<!--end-editor-- Reviewers: @wilkinson, @abhishektiwari, @mberacochea Archive: 10.5281/zenodo.10563479

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/f7d328cfad31d61f90eade9dfdb2ef8c"><img src="https://joss.theoj.org/papers/f7d328cfad31d61f90eade9dfdb2ef8c/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/f7d328cfad31d61f90eade9dfdb2ef8c/status.svg)](https://joss.theoj.org/papers/f7d328cfad31d61f90eade9dfdb2ef8c)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@wilkinson & @abhishektiwari & @mberacochea, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @danielskatz know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @abhishektiwari

📝 Checklist for @mberacochea

📝 Checklist for @wilkinson

editorialbot commented 10 months ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 10 months ago
Software report:

github.com/AlDanial/cloc v 1.88  T=0.22 s (1412.1 files/s, 201488.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Scala                          195           3439           3538          18353
SVG                              4              0              0          11803
Markdown                         7           1033              0           2310
YAML                            44             33             19           1444
Bourne Shell                    46            273            219           1392
Python                           8             52             19            144
JavaScript                       1              9              4             65
TeX                              1             11              0             43
R                                1              8             14             40
JSON                             1              1              0             20
Dockerfile                       1              6              7             18
EJS                              1              0              0              9
Rmd                              1              9             40              0
-------------------------------------------------------------------------------
SUM:                           311           4874           3860          35641
-------------------------------------------------------------------------------

gitinspector failed to run statistical information for the repository
editorialbot commented 10 months ago

Wordcount for paper.md is 1872

editorialbot commented 10 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1093/bioinformatics/bts480 is OK
- 10.1038/s41592-019-0654-x is OK
- 10.1038/s41576-023-00586-w is OK
- 10.15252/msb.20188746 is OK
- 10.1038/s41587-020-0439-x is OK
- 10.1038/nbt.3820 is OK
- 10.5281/ZENODO.4638196 is OK
- 10.5281/ZENODO.4735634 is OK
- 10.1371/journal.pone.0177459 is OK

MISSING DOIs

- None

INVALID DOIs

- None
editorialbot commented 10 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

danielskatz commented 10 months ago

👋 @wilkinson, @abhishektiwari, and @mberacochea - Thanks for agreeing to review this submission. This is the review thread for the paper. All of our communications will happen here from now on.

As you can see above, you each should use the command @editorialbot generate my checklist to create your review checklist. @editorialbot commands need to be the first thing in a new comment.

As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#6089 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks. Please let me know if either of you require some more time. We can also use editorialbot (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@danielskatz) if you have any questions/concerns.

abhishektiwari commented 9 months ago

Review checklist for @abhishektiwari

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

danielskatz commented 9 months ago

👋 @wilkinson and @mberacochea - I would appreciate it if you could use the command @editorialbot generate my checklist to create your review checklist, and then check off the CoI and CoC items, to make everything works. @editorialbot commands need to be the first thing in a new comment.

mberacochea commented 9 months ago

Review checklist for @mberacochea

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

abhishektiwari commented 9 months ago

@rcannood Please see my initial feedback.

Documentation

Community Guidelines

Please add community guidelines. I also think project will benefit from issue and pull request templates.

Paper

Title

For consistency, I would suggest paper headline to use data pipeline modules and not workflow modules. Everywhere else in the paper and documentation website authors reference Viash as solution for modular data pipelines. Having said that data pipelines are a special type of workflows with major focus on ETL, data cleaning, integration, and preparation for analysis or visualization, so I am also fine using workflow.

State of the field

Current state of the field can be further improved. Given the focus on Bioinformatics related computation/data pipeline frameworks, Viash, and it's counterparts Nextflow, Snakemake, CWL belong to Pipeline as code category. Please consider including Toil in the list as it supports script based pipeline definitions but also supports CWL and WDL. Plus there is other category of frameworks such as Galaxy and KNIME Workbench used by bioinformatics community those rely on UI based drag-n-drop pipeline builders – something worth mentioning here. Similarly, more generic Pipeline as code frameworks such Apache Airflow and Luigi can use standalone executable generated by Viash and used by bioinformatics community.

Suggested improvements (non-blocking)

Given the Viash has broader applicability beyond Bioinformatics particularly MLOps Pipelines, I suggest adding starter using other more generic frameworks such Airflow. In case of Apache Airflow, Airflow BashOperator will works just fine with standalone executable generated by Viash.

Overall comments

I applaud the author's for providing a very frictionless experience. Both Guide and Reference are well documented. I was able to finish everything from installation to functional testing to example in less than an hour. I really liked the use of GitHub template to create a quickstarter. Furthermore, I was able to add new modules on top of existing quickstarter very quickly and run end-to-end pipeline. Finally, it was great to see that authors provided guidance on IDE setup and testing of modules created using Viash.

wilkinson commented 9 months ago

Review checklist for @wilkinson

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

abhishektiwari commented 8 months ago

@danielskatz mostly done with my review. Just waiting for @rcannood response on state of field and title before I can mark this review completed from my side.

danielskatz commented 8 months ago

This comment is almost the same as a previous one that I've deleted, but fixes an error I made in tagging.

👋 @wilkinson, @abhishektiwari, @mberacochea, @rcannood - I'm just checking in post-holidays on how this is going.

It looks like @abhishektiwari provided some comments, but I'm unsure if @rcannood has acted on them.

And that @wilkinson and @mberacochea have started reviews, but I'm unsure if there's anything blocking them from making further progress.

Any feedback on status is welcome

mberacochea commented 8 months ago

@rcannood Please see my feedback.

General Comments

The paper is well-written and clearly explains the purpose and functionality of Viash. The authors provide detailed instructions on using the tool to describe modules with it. It is impressive in both the scope and the way it handles the whole process of pipeline writing, from initial module creation to fully functional pipelines using Nextflow.

Documentation

The documentation provided is comprehensive and allowed me to successfully follow the tutorial to create a fully operational pipeline. It consists of a practical tutorial and extensive usage guidelines, as well as documentation on the tool's API. I encountered one minor issue (https://github.com/viash-io/viash/issues/617) which the authors promptly resolved.

Source Code

While I am not familiar with Scala, I was able to understand the high-level logic by reading through the codebase, it is well structured and contains explanatory comments on complex sections.

Generated Nextflow Code

The Nextflow code generated by Viash differs from the canonical style users may be accustomed to in other pipelines. This is not a flaw, but worth noting as it may cause initial confusion. Even though the produced modules are not meant to be modified, the ability to read and modify the code improves the developer experience, specially for the workflow developers. The paper could explore this a bit, as it’s mainly focus on modules and not so much on workflow development.

One question I have - could the custom Nextflow helper methods in Viash be abstracted/pushed into a Nextflow plugin or some kind of library? This question shouldn’t be considered a blocker for this PR.

State of the field

My impression is that Viash provides functionality comparable to nf-core/nf-tools or Snakemake Wrappers, though their approaches are different. Specially around the re-usable modules or components to build pipelines. Nf-core uses a different paradigm but it shares some of concepts with Viash. Explicitly contrasting nf-core/nf-tools - Snakemake Wrappers and Viash would benefit potenial users. The authors may also want to briefly mention OpenPipelines as a repository of Viash modules.

Paper

Applications in Bioinformatics

I don't think Viash is restricted solely to single cell analysis. Based on the underlying technologies, it seems widely applicable in bioinformatics (at least in omics).

In Figure 3, Tables B and C appear inadvertently merged. The authors may want to separate them for clarity

danielskatz commented 8 months ago

👋 @rcannood - there are now two reviewers with comments for you to respond to.

rcannood commented 8 months ago

Thanks @abhishektiwari @mberacochea for your reviews! I'm currently sorting through my post-holiday workload and will go through your valuable feedback ASAP!

rcannood commented 8 months ago

Hey all! I went through the comments I received so far and tried to address each of the reviewers' concerns. Please let me know if you have any further feedback.

@abhishektiwari's comments

Community Guidelines

Please add https://github.com/viash-io/viash/issues/598. I also think project will benefit from issue and pull request templates.

Done, done, and done! Thanks for the suggestion!

Title consistency

For consistency, I would suggest paper headline to use data pipeline modules and not workflow modules. Everywhere else in the paper and documentation website authors reference Viash as solution for modular data pipelines. Having said that data pipelines are a special type of workflows with major focus on ETL, data cleaning, integration, and preparation for analysis or visualization, so I am also fine using workflow.

Good point! For the sake of consistency, I ended up replacing all instances of "pipeline" with "workflow" in https://github.com/viash-io/viash/commit/5029fe049a981bf37816354e181fc26757161fd2

State of the field

Current state of the field can be further improved. Given the focus on Bioinformatics related computation/data pipeline frameworks, Viash, and it's counterparts Nextflow, Snakemake, CWL belong to Pipeline as code category. Please consider including Toil in the list as it supports script based pipeline definitions but also supports CWL and WDL. Plus there is other category of frameworks such as Galaxy and KNIME Workbench used by bioinformatics community those rely on UI based drag-n-drop pipeline builders – something worth mentioning here. Similarly, more generic Pipeline as code frameworks such Apache Airflow and Luigi can use standalone executable generated by Viash and used by bioinformatics community.

Thanks for this suggestion! I completely rewrote the state of the field in https://github.com/viash-io/viash/commit/6a21f69a22add86bbf5f7f217e367e1a434ad9ac . Rather than only compare Viash to Snakemake, Nextflow, CWL and WDL, I instead compare Viash to workflow managers and portability solutions. Based on https://doi.org/10.1038/s41592-021-01254-9, we then discuss the different types of workflow managers (graphical, programmatic, specification-based) and portability solutions (package manager, containerization).

Suggested improvements (non-blocking)

Given the Viash has broader applicability beyond Bioinformatics particularly MLOps Pipelines, I suggest adding starter using other more generic frameworks such Airflow. In case of Apache Airflow, Airflow BashOperator will works just fine with standalone executable generated by Viash.

Thanks for your suggestion, we will definitely take this into account when extending our documentation with different workflow frameworks!

@mberacochea's comments

Generated Nextflow Code

One question I have - could the custom Nextflow helper methods in Viash be abstracted/pushed into a Nextflow plugin or some kind of library? This question shouldn’t be considered a blocker for this PR.

I would really like that! The latest release of Viash already split up the helper functions into separate helper files per function so we can unit test them separately. I would like some of this functionality to be included in a Groovy library so we don't need to include it in every Nextflow module that is generated.

State of the field

My impression is that Viash provides functionality comparable to nf-core/nf-tools or Snakemake Wrappers, though their approaches are different. Specially around the re-usable modules or components to build pipelines. Nf-core uses a different paradigm but it shares some of concepts with Viash. Explicitly contrasting nf-core/nf-tools - Snakemake Wrappers and Viash would benefit potenial users. The authors may also want to briefly mention OpenPipelines as a repository of Viash modules.

Thanks for making the connection with nf-core tools (I think you mean nf-core/modules instead?), Snakemake wrappers and OpenPipelines. I did not include references to these projects in the paper as I felt that creating a repository with reusable commonly used bioinformatics tools goes way beyond the scope of this manuscript.

Applications in bioinformatics

I don't think Viash is restricted solely to single cell analysis. Based on the underlying technologies, it seems widely applicable in bioinformatics (at least in omics).

Good point. I dropped the word "single-cell" in https://github.com/viash-io/viash/commit/874169afc30b620aee05baf237741bf5d9f4711e

Paper figures

In Figure 3, Tables B and C appear inadvertently merged. The authors may want to separate them for clarity

Thanks, I fixed this in https://github.com/viash-io/viash/commit/758db0f37cb23e2d01e321dd7dcfd8cdd69fb670 !

rcannood commented 8 months ago

@editorialbot generate pdf

editorialbot commented 8 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

danielskatz commented 8 months ago

👋 @abhishektiwari & @mberacochea - I look forward to hearing from you about @rcannood's changes

danielskatz commented 8 months ago

👋 @wilkinson - it looks like you are fairly close to done as well. Do the latest changes from @rcannood help you, and if not, what else do you need to complete your review?

wilkinson commented 8 months ago

@danielskatz I finished re-reading after the edits, and I am done with my review. The latest changes from @rcannood do help, and I can only really think of one (non-blocking) critique: in Figure 1, the box labeled "Script" just looks kind of empty and the linebreaks are kind of strange. I'm not sure how I would suggest to edit that, and it does not really matter. Overall, I am very impressed with this tool, the paper, and the authors!

danielskatz commented 8 months ago

Thanks @wilkinson!

abhishektiwari commented 8 months ago

@rcannood thanks for revision and updates.

@danielskatz Done with my review of this paper. No blocking feedback or comments from my side.

danielskatz commented 8 months ago

Thanks @abhishektiwari - can you check off the last item on your checklist?

danielskatz commented 8 months ago

👋 @mberacochea - How do things look for you now?

mberacochea commented 8 months ago

@rcannood great stuff, looking forward to see how viash keeps evolving.

@danielskatz My review is completed, my recommendation is to accept and publish this one.

danielskatz commented 8 months ago

@mberacochea - can you check off the last item on your checklist?

rcannood commented 8 months ago

Just updated figure 1 based on @wilkinson's feedback!

@editorialbot generate pdf

danielskatz commented 8 months ago

@editorialbot generate pdf

a command needs to be the first thing in a comment

editorialbot commented 8 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

danielskatz commented 8 months ago

@rcannood - please let me know when you are ready for the next step, which will be me proofreading the paper.

abhishektiwari commented 8 months ago

Thanks @abhishektiwari - can you check off the last item on your checklist?

Done.

rcannood commented 8 months ago

@rcannood - please let me know when you are ready for the next step, which will be me proofreading the paper.

I'm done making changes -- Thanks for taking a look at this!

danielskatz commented 8 months ago

@editorialbot generate pdf

editorialbot commented 8 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

danielskatz commented 8 months ago

@editorialbot recommend-accept

editorialbot commented 8 months ago

Paper is not ready for acceptance yet, the archive is missing

danielskatz commented 8 months ago

@rcannood - I've suggested some edits in https://github.com/viash-io/viash/pull/628.

Also, I'm not sure if "This separates the component functionality from the workflow workflow" is correct or not. Do you want to say "workflow workflow"?

Once you've merged the PR and perhaps changed the additional line I mention, please

I can then move forward with accepting the submission.

rcannood commented 8 months ago

@editorialbot generate pdf

editorialbot commented 8 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

danielskatz commented 8 months ago

@editorialbot set 0.8.4 as version

editorialbot commented 8 months ago

Done! version is now 0.8.4

danielskatz commented 8 months ago

@editorialbot set 10.5281/zenodo.10563479 as archive

editorialbot commented 8 months ago

Done! archive is now 10.5281/zenodo.10563479

danielskatz commented 8 months ago

@editorialbot recommend-accept

editorialbot commented 8 months ago
Attempting dry run of processing paper acceptance...
editorialbot commented 8 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1093/bioinformatics/bts480 is OK
- 10.1038/s41592-019-0654-x is OK
- 10.1038/s41576-023-00586-w is OK
- 10.15252/msb.20188746 is OK
- 10.1038/s41587-020-0439-x is OK
- 10.1038/nbt.3820 is OK
- 10.5281/ZENODO.4638196 is OK
- 10.1038/s41592-021-01254-9 is OK
- 10.1186/gb-2010-11-8-r86 is OK
- 10.1016/j.jbiotec.2017.07.028 is OK
- 10.5281/ZENODO.4735634 is OK
- 10.1145/3486897 is OK

MISSING DOIs

- None

INVALID DOIs

- None
editorialbot commented 8 months ago

:wave: @openjournals/csism-eics, this paper is ready to be accepted and published.

Check final proof :point_right::page_facing_up: Download article

If the paper PDF and the deposit XML files look good in https://github.com/openjournals/joss-papers/pull/4947, then you can now move forward with accepting the submission by compiling again with the command @editorialbot accept

danielskatz commented 8 months ago

@editorialbot accept