[PRE REVIEW]: target-methylseq-qc: a lightweight pipeline for collecting metrics from targeted sequence mapping files.

editorialbot commented 2 months ago

Submitting author: !--author-handle-->@abhi18av@csoneson<!--end-editor-- Reviewers: Pending Managing EiC: Kevin M. Moerman

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/220574eb92e0a5c64f58eb092dfd399a"><img src="https://joss.theoj.org/papers/220574eb92e0a5c64f58eb092dfd399a/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/220574eb92e0a5c64f58eb092dfd399a/status.svg)](https://joss.theoj.org/papers/220574eb92e0a5c64f58eb092dfd399a)

Author instructions

Thanks for submitting your paper to JOSS @abhi18av. Currently, there isn't a JOSS editor assigned to your paper.

@abhi18av if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). You can search the list of people that have already agreed to review and may be suitable for this submission.

Editor instructions

The JOSS submission bot @editorialbot is here to help you find and assign reviewers and start the main review. To find out what @editorialbot can do for you type:

@editorialbot commands

editorialbot commented 2 months ago

Hello human, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

editorialbot commented 2 months ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.1101/2023.04.29.23289314 is OK
- 10.1038/nbt.3820 is OK
- 10.1038/s41592-018-0046-7 is OK
- 10.1093/bioinformatics/btx192 is OK
- 10.5281/zenodo.10463781 is OK
- 10.1093/gigascience/giab008 is OK
- 10.1101/gr.107524.110 is OK
- 10.1038/nbt.3820 is OK
- 10.1093/bioinformatics/btw354 is OK
- 10.1038/s41587-020-0439-x is OK
- 10.1093/bioinformatics/btq033 is OK
- 10.5281/zenodo.13147688 is OK
- 10.5281/zenodo.13601364 is OK

🟡 SKIP DOIs

- No DOI given, and none found for title: The nf-core framework for community-curated bioinf...
- No DOI given, and none found for title: CreateSequenceDictionary (Picard)
- No DOI given, and none found for title: Picard toolkit
- No DOI given, and none found for title: CollectHsMetrics (Picard)
- No DOI given, and none found for title: CollectMultipleMetrics (Picard)
- No DOI given, and none found for title: HTS format specifications
- No DOI given, and none found for title: Babraham Bioinformatics - FastQC A Quality Control...
- No DOI given, and none found for title: Twist Methylome
- No DOI given, and none found for title: Twist Methylome
- No DOI given, and none found for title: target-methylseq-qc website

❌ MISSING DOIs

- None

❌ INVALID DOIs

- None

editorialbot commented 2 months ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.05 s (1532.4 files/s, 244651.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
CSS                              5             39             20           2238
JavaScript                      11            235            226           2112
SVG                              3              3              3           2081
HTML                             4             53             10           1537
YAML                            27             74             30            905
JSON                             7              2              0            635
XML                              2              0              0            518
Markdown                         9            295              0            494
Groovy                           4             76            103            354
TeX                              1             31              0            339
Python                           2             61             90            183
CSV                              3              0              0             10
TOML                             1              1              2              7
Bourne Shell                     1              0              0              5
-------------------------------------------------------------------------------
SUM:                            80            870            484          11418
-------------------------------------------------------------------------------

Commit count by author:

   111  Abhinav Sharma
     1  Patricia
     1  t4ly4

editorialbot commented 2 months ago

Paper file info:

📄 Wordcount for paper.md is 1753

✅ The paper includes a Statement of need section

editorialbot commented 2 months ago

License info:

✅ License found: MIT License (Valid open source OSI approved license)

editorialbot commented 2 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot commented 2 months ago

Five most similar historical JOSS papers:

Acanthophis: a comprehensive plant hologenomics pipeline Submitting author: @kdm9 Handling editor: @marcosvital (Active) Reviewers: @bricoletc, @gbouras13, @abhishektiwari Similarity score: 0.7228

MetaGenePipe: An Automated, Portable Pipeline for Contig-based Functional and Taxonomic Analysis Submitting author: @ParkvilleData Handling editor: @jmschrei (Active) Reviewers: @Ebedthan, @rjorton Similarity score: 0.7062

nf-gwas-pipeline: A Nextflow Genome-Wide Association Study Pipeline Submitting author: @ZeyuanSong Handling editor: @lpantano (Active) Reviewers: @preetida, @rspirgel Similarity score: 0.6955

CheckQC: Quick quality control of Illumina sequencing runs Submitting author: @johandahlberg Handling editor: @pjotrp (Retired) Reviewers: @brainstorm Similarity score: 0.6866

Koverage: Read-coverage analysis for massive (meta)genomics datasets Submitting author: @beardymcjohnface Handling editor: @csoneson (Active) Reviewers: @lparsons, @telatin Similarity score: 0.6764

⚠️ Note to editors: If these papers look like they might be a good match, click through to the review issue for that paper and invite one or more of the authors before considering asking the reviewers of these papers to review again for JOSS.

abhi18av commented 2 months ago

CC @agudeloromero @t4ly4

Kevin-Mattheus-Moerman commented 2 months ago

@abhi18av Dear author, thanks for this submission. I am the AEiC on this track and here to help process the initial steps. Before we proceed, please can you have a look at the following points:

[x] Please study the above reference check :point_up: and see if you can address any of the reported potential DOI issues. You can add/amend DOI entries in your .bib file, and call @editorialbot check references here to check them again.
[x] I may have missed it, but can you confirm this project features (automated) testing? If so it may be good to link to this in the README.
[x] Could you help me understand the above code report, and potentially add to it in terms of nextflow contributions? What aspects of the report would you say is your core achievement/new contribution? In addition, can you help estimate the lines of code number for the nextflow work? We ask this as some tools exist e.g. to automatically generate JavaScript GUI related code for instance. So any help to judge the "weight/size" of this submission would be appreciated.

Kevin-Mattheus-Moerman commented 2 months ago

@editorialbot invite @csoneson as editor

editorialbot commented 2 months ago

Invitation to edit this submission sent!

csoneson commented 2 months ago

In principle I'm happy to edit this, but would like to first wait for the author's responses to @Kevin-Mattheus-Moerman's queries above.

abhi18av commented 2 months ago

@abhi18av Dear author, thanks for this submission. I am the AEiC on this track and here to help process the initial steps. Before we proceed, please can you have a look at the following points:

Dear @Kevin-Mattheus-Moerman and @csoneson , thank you for your time to evaluate this manuscript!

I have addressed the comments inline.

✅ Please study the above reference check ☝️ and see if you can address any of the reported potential DOI issues. You can add/amend DOI entries in your .bib file, and call @editorialbot check references here to check them again.

Sure, I have updated the DOI for a few more citations, however as some entries don't have an associated publication (such as picard) or there's no consensus on how to cite (such as HTS specification https://github.com/samtools/hts-specs/issues/179 ), I have simply used the @online bib resource annotation for those.

If there's a better way or a JOSS convention to address this, please let us know and we will be happy to accommodate.

✅ I may have missed it, but can you confirm this project features (automated) testing? If so it may be good to link to this in the README.

Ah yes, there are a bunch of Github actions in the repo which are triggered upon relevant events.

In addition, I have added an explanation in the REAMDE for the bundled test dataset which we provide to users for quick testing https://github.com/wal-yan/target-methylseq-qc?tab=readme-ov-file#testing .

✅ Could you help me understand the above code report, and potentially add to it in terms of nextflow contributions? What aspects of the report would you say is your core achievement/new contribution? In addition, can you help estimate the lines of code number for the nextflow work? We ask this as some tools exist e.g. to automatically generate JavaScript GUI related code for instance. So any help to judge the "weight/size" of this submission would be appreciated.

In terms of the cloc report from https://github.com/openjournals/joss-reviews/issues/7238#issuecomment-2352844762, I must say that numbers hide the overall big picture, but thank you for raising this.

The principle changes regarding the implementation logic are of course in the Nextflow/Groovy layer, however as Nextflow is just the DSL for the orchestration of tasks, we have worked on other layers/languages as well.

The samplesheet check (written in Python) is specific to this pipeline and checks for the overall validity of the samplesheet as a pre-flight check, in addition to the test samplesheet files in CSV format (assets/test_samplesheet_bed_filter.csv and assets/test_samplesheet_picard_profiler.csv.

Furthermore, once the analysis is done, the generated results are merged and pushed to MultiQC which relies on a customized YAML file (assets/multiqc_config.yml) in order to present the principal summary report.

Finally in terms of the UI for Nextflow Schema renderers, the JSON format has been customized to reflect the principal parameters of the pipeline corresponding to different modes.

I must also highlight that the creation of test_* profiles is done in *`conf/config** scripts which are alsoGroovy/Nextflowscripts but __do not__ get picked up bycloc` as any language in the overall counts.

Therefore, kindly take this into consideration 🙏

EDIT (24-09-2024)

Actually, I have realized that programs like cloc, tokei or scc do NOT take Nextflow code into account. Here's a test

Therefore none of the Nextflow (or config) files from the core implementation shows up

abhi18av commented 2 months ago

@editorialbot check references

abhi18av commented 2 months ago

@editorialbot generate pdf

editorialbot commented 2 months ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.1101/2023.04.29.23289314 is OK
- 10.1038/nbt.3820 is OK
- 10.1038/s41587-020-0439-x is OK
- 10.1038/s41592-018-0046-7 is OK
- 10.1093/bioinformatics/btx192 is OK
- 10.5281/zenodo.10463781 is OK
- 10.1093/gigascience/giab008 is OK
- 10.1101/gr.107524.110 is OK
- 10.1038/nbt.3820 is OK
- 10.1093/bioinformatics/btw354 is OK
- 10.1038/s41587-020-0439-x is OK
- 10.1093/bioinformatics/btq033 is OK
- 10.5281/zenodo.8251379 is OK
- 10.5281/zenodo.13597863 is OK

🟡 SKIP DOIs

- No DOI given, and none found for title: CreateSequenceDictionary (Picard)
- No DOI given, and none found for title: Picard toolkit
- No DOI given, and none found for title: CollectHsMetrics (Picard)
- No DOI given, and none found for title: CollectMultipleMetrics (Picard)
- No DOI given, and none found for title: HTS format specifications
- No DOI given, and none found for title: Babraham Bioinformatics - FastQC A Quality Control...
- No DOI given, and none found for title: Twist Methylome
- No DOI given, and none found for title: Twist Methylome
- No DOI given, and none found for title: target-methylseq-qc website

❌ MISSING DOIs

- None

❌ INVALID DOIs

- None

editorialbot commented 2 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot commented 2 months ago

Five most similar historical JOSS papers:

Acanthophis: a comprehensive plant hologenomics pipeline Submitting author: @kdm9 Handling editor: @marcosvital (Active) Reviewers: @bricoletc, @gbouras13, @abhishektiwari Similarity score: 0.7247

MetaGenePipe: An Automated, Portable Pipeline for Contig-based Functional and Taxonomic Analysis Submitting author: @ParkvilleData Handling editor: @jmschrei (Active) Reviewers: @Ebedthan, @rjorton Similarity score: 0.7067

nf-gwas-pipeline: A Nextflow Genome-Wide Association Study Pipeline Submitting author: @ZeyuanSong Handling editor: @lpantano (Active) Reviewers: @preetida, @rspirgel Similarity score: 0.6974

CheckQC: Quick quality control of Illumina sequencing runs Submitting author: @johandahlberg Handling editor: @pjotrp (Retired) Reviewers: @brainstorm Similarity score: 0.6879

RNAsik: A Pipeline for complete and reproducible RNA-seq analysis that runs anywhere with speed and ease Submitting author: @serine Handling editor: @pjotrp (Retired) Reviewers: @andrewyatz Similarity score: 0.6763

⚠️ Note to editors: If these papers look like they might be a good match, click through to the review issue for that paper and invite one or more of the authors before considering asking the reviewers of these papers to review again for JOSS.

Kevin-Mattheus-Moerman commented 2 months ago

@editorialbot query scope

editorialbot commented 2 months ago

Submission flagged for editorial review.

Kevin-Mattheus-Moerman commented 2 months ago

@abhi18av thanks for providing those additional details. I have just flagged this submission for a scope review by our editorial board. This is because I need some help to determine if this work is in scope, and if the pipeline/workflow you present meets our substantial scholarly effort criterion.

The scope review should take about 2 weeks to complete.

abhi18av commented 2 months ago

Hi @Kevin-Mattheus-Moerman ,

I was trying to understand cloc better and I've edited my response here https://github.com/openjournals/joss-reviews/issues/7238#issuecomment-2367633717 to note that cloc ( or tokei etc) do not take Nextflow files into account therefore the principal implementation logic (in .nf and .config files) is not really included in the line count reports unfortunately 🤷

Kevin-Mattheus-Moerman commented 2 months ago

@abhi18av yes I'm aware cloc doesn't count nextflow code lines. Hence I asked you to elaborate. If you can help count/estimate these lines yourself that would be helpful.

abhi18av commented 1 month ago

Hi @Kevin-Mattheus-Moerman , apologies for the late response.

Sure, I am happy to provide the line counts for the Nextflow nf and config files.

The total number of lines for Nextflow specific code is 1381.

Tools used

Powershell for scripting
fd command
wc -l command

Here's my method to compute nextflow code lines

Find all nf | config files from project root.

wal-yan-target-methylseq-qc  🍣 master 🅒 base

+  p$ fd "nf$|config$"  -t f
conf/base.config
conf/modules.config
conf/test_bed_filter.config
conf/test_picard_profiler.config
main.nf
modules/local/samplesheet_check.nf
modules/nf-core/bedtools/intersect/main.nf
modules/nf-core/custom/dumpsoftwareversions/main.nf
modules/nf-core/fastqc/main.nf
modules/nf-core/multiqc/main.nf
modules/nf-core/picard/collecthsmetrics/main.nf
modules/nf-core/picard/collectmultiplemetrics/main.nf
modules/nf-core/picard/createsequencedictionary/main.nf
modules/nf-core/samtools/faidx/main.nf
modules/nf-core/samtools/index/main.nf
nextflow.config
subworkflows/local/input_check.nf
workflows/bed_filter.nf
workflows/picard_profiler.nf

Save these files in a variable $nextflowSourceFiles

$nextflowSourceFiles = fd "nf$|config$"  -t f

Iterate upon this file list and execute wc -l

wal-yan-target-methylseq-qc  🍣 master 🅒 base
+  p$ foreach ($f in $nextflowSourceFiles ) { wc -l $f }

65 conf/base.config
50 conf/modules.config
30 conf/test_bed_filter.config
32 conf/test_picard_profiler.config
81 main.nf
31 modules/local/samplesheet_check.nf
39 modules/nf-core/bedtools/intersect/main.nf
24 modules/nf-core/custom/dumpsoftwareversions/main.nf
51 modules/nf-core/fastqc/main.nf
53 modules/nf-core/multiqc/main.nf
83 modules/nf-core/picard/collecthsmetrics/main.nf
67 modules/nf-core/picard/collectmultiplemetrics/main.nf
44 modules/nf-core/picard/createsequencedictionary/main.nf
50 modules/nf-core/samtools/faidx/main.nf
48 modules/nf-core/samtools/index/main.nf
264 nextflow.config
42 subworkflows/local/input_check.nf
140 workflows/bed_filter.nf
187 workflows/picard_profiler.nf

Update the loop for adding the line-counts in a $sum variable

wal-yan-target-methylseq-qc  🍣 master 🅒 base
+  p$ $sum=0; foreach ($f in $nextflowSourceFiles ) { $sum += $(wc -l $f).split(" ")[0] }; $sum

1381

abhi18av commented 1 month ago

Dear @Kevin-Mattheus-Moerman and @csoneson,

As it has been a couple of weeks since last activity on this review, could you please let me know if there's anything I can do to facilitate. I have answered all the questions to the best of my knowledge and our team is eagerly looking forward to receiving a response.

Kevin-Mattheus-Moerman commented 1 month ago

@abhi18av apologies for the delay. I hope to be able to conclude our scope review by early next week.

abhi18av commented 1 month ago

Thank you @Kevin-Mattheus-Moerman , we are looking forward to it.

Kevin-Mattheus-Moerman commented 1 week ago

@editorialbot assign @csoneson as editor

editorialbot commented 1 week ago

Assigned! @csoneson is now the editor

Kevin-Mattheus-Moerman commented 1 week ago

@csoneson this has passed the initial scope review so I've just assigned you as editor.

csoneson commented 1 week ago

👋🏻 @abhi18av - I will handle your submission, and start by finding at least two suitable reviewers. Feel free to let me know if you have suggestions (e.g. see the first post in this issue ☝🏻)

csoneson commented 1 week ago

@editorialbot generate pdf

editorialbot commented 1 week ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot commented 1 week ago

Five most similar historical JOSS papers:

Acanthophis: a comprehensive plant hologenomics pipeline Submitting author: @kdm9 Handling editor: @marcosvital (Active) Reviewers: @bricoletc, @gbouras13, @abhishektiwari Similarity score: 0.7128

MetaGenePipe: An Automated, Portable Pipeline for Contig-based Functional and Taxonomic Analysis Submitting author: @ParkvilleData Handling editor: @jmschrei (Active) Reviewers: @Ebedthan, @rjorton Similarity score: 0.7043

nf-gwas-pipeline: A Nextflow Genome-Wide Association Study Pipeline Submitting author: @ZeyuanSong Handling editor: @lpantano (Active) Reviewers: @preetida, @rspirgel Similarity score: 0.6834

CheckQC: Quick quality control of Illumina sequencing runs Submitting author: @johandahlberg Handling editor: @pjotrp (Retired) Reviewers: @brainstorm Similarity score: 0.6797

Koverage: Read-coverage analysis for massive (meta)genomics datasets Submitting author: @beardymcjohnface Handling editor: @csoneson (Active) Reviewers: @lparsons, @telatin Similarity score: 0.6735

⚠️ Note to editors: If these papers look like they might be a good match, click through to the review issue for that paper and invite one or more of the authors before considering asking the reviewers of these papers to review again for JOSS.

csoneson commented 1 week ago

@abhi18av In the meanwhile, I also noticed that your paper is a bit on the long side for JOSS standards (recommended length is 250-1000 words, see here). I think you could, for example, move some of the parts related to the installation and available parameters to the GitHub repository (again, see the link above for more details about what we expect a JOSS paper to contain).

Also, could you please add a country to affiliation (1). Thanks!

abhi18av commented 6 days ago

Hi @Kevin-Mattheus-Moerman and @csoneson ,

We are glad to hear the news that the our submission is progressing in the review pipeline.

@csoneson I have accommodated your suggestions and trimmed the sections regarding download, test profiles etc from the content, however I think that the content of tables is what increases the word count overall. I'm not sure if there's a way to instruct the editorialbot not to count the table as plain text 🤔

In any case, we would be happy to accommodate any further changes you feel fit for the publication.

@editorialbot generate pdf

csoneson commented 2 days ago

@editorialbot generate pdf

csoneson commented 2 days ago

(@abhi18av The editorialbot commands must be the first ones in a comment 🙂)

csoneson commented 2 days ago

@editorialbot check repository

editorialbot commented 2 days ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.05 s (1469.0 files/s, 235071.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
CSS                              5             39             20           2238
JavaScript                      11            235            226           2112
SVG                              3              3              3           2081
HTML                             4             53             10           1537
YAML                            27             74             30            905
JSON                             7              2              0            635
XML                              2              0              0            518
Markdown                         9            315              0            501
Groovy                           4             76            103            354
TeX                              1             32              0            341
Python                           2             61             90            183
CSV                              3              0              0             10
TOML                             1              1              2              7
Bourne Shell                     1              0              0              5
-------------------------------------------------------------------------------
SUM:                            80            891            484          11427
-------------------------------------------------------------------------------

Commit count by author:

   122  Abhinav Sharma
     1  Patricia
     1  t4ly4

editorialbot commented 2 days ago

Paper file info:

📄 Wordcount for paper.md is 1662

✅ The paper includes a Statement of need section

editorialbot commented 2 days ago

License info:

✅ License found: MIT License (Valid open source OSI approved license)

editorialbot commented 2 days ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot commented 2 days ago

Five most similar historical JOSS papers:

Acanthophis: a comprehensive plant hologenomics pipeline Submitting author: @kdm9 Handling editor: @marcosvital (Active) Reviewers: @bricoletc, @gbouras13, @abhishektiwari Similarity score: 0.7200

MetaGenePipe: An Automated, Portable Pipeline for Contig-based Functional and Taxonomic Analysis Submitting author: @ParkvilleData Handling editor: @jmschrei (Active) Reviewers: @Ebedthan, @rjorton Similarity score: 0.7057

nf-gwas-pipeline: A Nextflow Genome-Wide Association Study Pipeline Submitting author: @ZeyuanSong Handling editor: @lpantano (Active) Reviewers: @preetida, @rspirgel Similarity score: 0.6922

CheckQC: Quick quality control of Illumina sequencing runs Submitting author: @johandahlberg Handling editor: @pjotrp (Retired) Reviewers: @brainstorm Similarity score: 0.6854

Koverage: Read-coverage analysis for massive (meta)genomics datasets Submitting author: @beardymcjohnface Handling editor: @csoneson (Active) Reviewers: @lparsons, @telatin Similarity score: 0.6730

⚠️ Note to editors: If these papers look like they might be a good match, click through to the review issue for that paper and invite one or more of the authors before considering asking the reviewers of these papers to review again for JOSS.

openjournals / joss-reviews