openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
725 stars 38 forks source link

[REVIEW]: Back to sequences: find the origin of k-mers #7066

Closed editorialbot closed 1 month ago

editorialbot commented 3 months ago

Submitting author: !--author-handle-->@pierrepeterlongo<!--end-author-handle-- (Pierre Peterlongo) Repository: https://github.com/pierrepeterlongo/back_to_sequences/ Branch with paper.md (empty if default branch): Version: v0.6.6 Editor: !--editor-->@majensen<!--end-editor-- Reviewers: @Anjan-Purkayastha, @amoeba Archive: 10.5281/zenodo.13794732

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/2b20d0fa287109fb0fcdb39b48b81b21"><img src="https://joss.theoj.org/papers/2b20d0fa287109fb0fcdb39b48b81b21/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/2b20d0fa287109fb0fcdb39b48b81b21/status.svg)](https://joss.theoj.org/papers/2b20d0fa287109fb0fcdb39b48b81b21)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@Anjan-Purkayastha & @amoeba, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review. First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @majensen know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @Anjan-Purkayastha

📝 Checklist for @amoeba

editorialbot commented 3 months ago

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf
editorialbot commented 3 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1186/s13059-022-02771-2 is OK
- 10.1038/nmeth.1376 is OK
- 10.1101/gr.101360.109 is OK
- 10.7717/peerj-cs.94 is OK
- 10.1093/bfgp/elr035 is OK
- 10.1186/s13059-019-1632-4 is OK
- 10.1093/bioinformatics/btu288 is OK
- 10.1093/nar/gku1187 is OK
- 10.1093/bioadv/vbac029 is OK
- 10.1101/gr.277615.122 is OK
- 10.1145/585265.585267 is OK
- 10.1089/cmb.2012.0021 is OK
- 10.1016/j.dam.2018.03.035 is OK
- 10.1186/s13059-019-1891-0 is OK
- 10.1038/s41579-020-0364-5 is OK
- 10.1093/bioinformatics/btac689 is OK
- 10.1186/s12864-015-1406-7 is OK
- 10.1016/j.isci.2023.108057 is OK

MISSING DOIs

- No DOI given, and none found for title: The Platinum Searcher
- No DOI given, and none found for title: The Silver Searcher
- No DOI given, and none found for title: Kmer Mapper
- No DOI given, and none found for title: AHash: A Load-Balanced One Permutation Hash

INVALID DOIs

- None
editorialbot commented 3 months ago

Software report:

github.com/AlDanial/cloc v 1.90  T=0.02 s (1554.0 files/s, 235858.6 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Rust                            11            357            216           1792
CSV                              3              0              0           1497
Markdown                         6            118              0            469
TeX                              1             26              3            216
Python                           6             50             37            152
Bourne Shell                     4             38             23            144
YAML                             3             17              0            102
TOML                             1             12              5             38
-------------------------------------------------------------------------------
SUM:                            35            618            284           4410
-------------------------------------------------------------------------------

Commit count by author:

   134  Pierre Peterlongo
    14  Pierre Marijon
    10  Anthony Baire
     7  PETERLONGO Pierre
     1  Francesco Andreace
editorialbot commented 3 months ago

Paper file info:

📄 Wordcount for paper.md is 2348

✅ The paper includes a Statement of need section

editorialbot commented 3 months ago

License info:

🟡 License found: GNU Affero General Public License v3.0 (Check here for OSI approval)

editorialbot commented 3 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

Anjan-Purkayastha commented 3 months ago

Review checklist for @Anjan-Purkayastha

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

majensen commented 3 months ago

@Anjan-Purkayastha @amoeba thanks for agreeing to review. Let me know if there are any blockers for you. Mark

amoeba commented 3 months ago

Hey @majensen, I can have my review in by Aug 18 if that timeline works.

editorialbot commented 3 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

editorialbot commented 3 months ago

I'm sorry @Anjan-Purkayastha, I'm afraid I can't do that. That's something only editors are allowed to do.

Anjan-Purkayastha commented 3 months ago

@majensen: Have completed installing and testing the tool. It is a useful addition to the set of general-purpose bioinformatics tools we use for processing NGS data. I have left some good enhancement suggestion at this link, for the authors. Aside from this, they need to update their citations, and they should be good to go. This is my first time interacting over GitHub, please do let me know if there is anything else I need to do to complete this review. Cheers.

majensen commented 3 months ago

Thanks very much @Anjan-Purkayastha -

amoeba commented 3 months ago

Review checklist for @amoeba

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

amoeba commented 3 months ago

Hi @majensen and @pierrepeterlongo, I've finished my review and filed an issue on your repo. I do ask for some changes there so please have a look and let me know when you'd like me to re-review.

majensen commented 3 months ago

Hi @pierrepeterlongo - the reviewers have made their comments and have items to address at https://github.com/pierrepeterlongo/back_to_sequences/issues/6 and https://github.com/pierrepeterlongo/back_to_sequences/issues/8. Please keep us informed on your progress here as you work through these. Thanks!

pierrepeterlongo commented 3 months ago

Hi @majensen Thanks a lot for your message. I'm planning to apply the (nice) recommendations by the end of the week.

Pierre

pierrepeterlongo commented 3 months ago

@editorialbot generate pdf

editorialbot commented 3 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

pierrepeterlongo commented 3 months ago

Hello @majensen,

We took in consideration reviewer's comments. I closed the related issues. https://github.com/pierrepeterlongo/back_to_sequences/issues/6 and https://github.com/pierrepeterlongo/back_to_sequences/issues/8.

Best, Pierrre

majensen commented 3 months ago

Thanks @pierrepeterlongo - I am fine with your solution regarding the length of the paper (I tend to be lenient on this aspect). @amoeba @Anjan-Purkayastha if you would have a look this week that would be excellent. Thanks!

amoeba commented 2 months ago

Hi @majensen: Following the submitting author accepting all of my changes in https://github.com/pierrepeterlongo/back_to_sequences/issues/8, I have checked off the remaining items in my checklist.

The final issue I have is that the dedicated documentation for the project is kept in a separate GitHub repo and my worry is that it won't be archived alongside the source code. I think it's important for the documentation to be archived by JOSS so I wonder if I wonder if JOSS doesn't care, whether JOSS can archive both repositories, or if I should push the author to integrate the two repositories. Please advise @majensen.

majensen commented 2 months ago

@amoeba - I think this is a worthy consideration. I don't think we need to ask for a change in the code organization. What I see on Zenodo is a place for "Related" artifacts with associated metadata:

zenodo-metadata

So my recommendation would be, create an additional Zenodo archive with the docs repo, and then add the DOI in the "Related" section, with the tag "is documented by". Thoughts?

amoeba commented 2 months ago

I think that would be sufficient. Thanks for coming up with a solution. My review is now an Accept.

pierrepeterlongo commented 2 months ago

Hey,

Thanks for this suggestion. I'd like to be sure to understand your suggestion.

You suggest a Zenodo archive for the doc, thus generating a DOI, say DOI_doc + a Zenodo archive for the code, indicating the DOI_doc as a "is documented by" relation

Do I understand you correctly?

majensen commented 2 months ago

Exactly right @pierrepeterlongo - as one of the last steps (after @Anjan-Purkayastha does his final check), I'll ask you to create a Zenodo archive of the code and send us back the DOI. In this case, we'd ask you to also create an archive for the docs repo and annotate the code repo as you describe. You can do this at any time, but may be worth waiting until we get the final check.

pierrepeterlongo commented 2 months ago

Ok perfect, I wait for the final check. Thanks.

Anjan-Purkayastha commented 2 months ago

@pierrepeterlongo: have reviewed the latest set of changes. Decision is to Accept.

pierrepeterlongo commented 2 months ago

Hello

Thanks @Anjan-Purkayastha for your message and review work. @majensen: I should find time to prepare the Zenodo archives (code and doc) by Wednesday.

majensen commented 2 months ago

Thanks @pierrepeterlongo - I will look over the paper itself and may have some minor suggestions.

majensen commented 2 months ago

@editorialbot generate pdf

editorialbot commented 2 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

majensen commented 2 months ago

@pierrepeterlongo: can you consider making or responding to the following suggestions in the MS (line numbers of latest PDF in this thread):

Title: capitalize “Find”?

l.13: “finding back” -> “recovering” l.16: “occurences” -> distribution? l.17: “short read” -> “short reads”

l.20 - may be more readable with each step in its own paragraph (like a bulleted list); I would capitalize the first word in each step.

l.37: “it has been…” -> “this has been…” l.41: “only a few.” -> “only a few examples.”

l.55: “associated to” -> “associated with” l.56: delete “to cite a few”

l.59 “Finding back” -> “Recovering” l.60 “hardly scales” -> “is too costly for” l.62 “even if they were” -> “even though they have been” l.65 “but not extract” -> “but do not extract”

l.67 “eg.” -. “e.g.,” l.75 “the feature to extract” -> “a feature for extracting” l.79 “one k-mer” -> “one k-mer (length 31)” (correct?)

l.80: link “https://b2s-doc.readthedocs.io/en/latest/benchmark.html” is broken; https://b2s-doc.readthedocs.io/en/latest/results.html looks promising, but doesn’t contain the results for the alternative apps.

l.87 “Summing up, these…” -> “Summing up, we find these…” l.87 “not meant for” -> “not appropriate for”

l.95 “finding back” -> “recovering” l.97-98 “applications, such as” -> “applications, in such areas as” l.99. “datasets, all of which”. I would make a separate final sentence, like “Because of the efficiency of our approach, such applications could be executed in real time during the sequencing process.”

Refs

l.113: “de bruin” -> “de Bruin” l.118: “bloom filters” -> “Bloom filters” l.132: “bloom filters” -> “Bloom filters” l.156: “kraken 2” -> “Kraken 2”

pierrepeterlongo commented 2 months ago

Hi @majensen

Thank you for your careful reading and suggestions.

I applied all of them at the exception of:

I also updated the https://b2s-doc.readthedocs.io/en/latest/results.html page adding notes about the section Possible Alternatives.

I'm waiting for your feedback before to create zenodo repositories.

Thanks again, Pierre

pierrepeterlongo commented 2 months ago

@editorialbot generate pdf

editorialbot commented 2 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

majensen commented 2 months ago

Thanks @pierrepeterlongo - I think it reads smoothly. Are you sure you don't want to capitalize "Bloom filters" in the refs (since Bloom is the name of the guy who invented them?). In any case, please go ahead and create the archive, and report the DOI back in this thread.

pierrepeterlongo commented 2 months ago

@editorialbot generate pdf

editorialbot commented 2 months ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

pierrepeterlongo commented 2 months ago

Hi @majensen

That's crazy but I can't change 'b' to 'B' in the references. If I change "bloom" to "Bloom" in the .bib, this has no effect on the pdf, but if I change "bloom" to "BBloom" for instance, with two 'B's the pdf changes accordingly.

Anyway I created the two zenodo repositories:

Best, Pierre

majensen commented 2 months ago

No problem Pierre - we'll let the dragons of BibTex sleep.

majensen commented 2 months ago

One minor update @pierrepeterlongo - can you make the title of the repo https://doi.org/10.5281/zenodo.13794732 the same as that of the paper, ie "Back to sequences: find the origin of k-mers"? This is something JOSS requires.

majensen commented 2 months ago

@editorialbot set 10.5281/zenodo.13794732 as archive

editorialbot commented 2 months ago

Done! archive is now 10.5281/zenodo.13794732

majensen commented 2 months ago

@editorialbot set 0.6.6 as version

editorialbot commented 2 months ago

Done! version is now 0.6.6

pierrepeterlongo commented 2 months ago

One minor update @pierrepeterlongo - can you make the title of the repo https://doi.org/10.5281/zenodo.13794732 the same as that of the paper, ie "Back to sequences: find the origin of k-mers"? This is something JOSS requires.

Done :)

majensen commented 2 months ago

@editorialbot recommend-accept

editorialbot commented 2 months ago
Attempting dry run of processing paper acceptance...
editorialbot commented 2 months ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.1186/s13059-022-02771-2 is OK
- 10.1038/nmeth.1376 is OK
- 10.1101/gr.101360.109 is OK
- 10.7717/peerj-cs.94 is OK
- 10.1093/bfgp/elr035 is OK
- 10.1186/s13059-019-1632-4 is OK
- 10.1093/bioinformatics/btu288 is OK
- 10.1093/nar/gku1187 is OK
- 10.1093/bioadv/vbac029 is OK
- 10.1101/gr.277615.122 is OK
- 10.1145/585265.585267 is OK
- 10.1089/cmb.2012.0021 is OK
- 10.1016/j.dam.2018.03.035 is OK
- 10.1186/s13059-019-1891-0 is OK
- 10.1038/s41579-020-0364-5 is OK
- 10.1093/bioinformatics/btac689 is OK
- 10.1186/s12864-015-1406-7 is OK
- 10.1016/j.isci.2023.108057 is OK

🟡 SKIP DOIs

- No DOI given, and none found for title: The Platinum Searcher
- No DOI given, and none found for title: The Silver Searcher
- No DOI given, and none found for title: Kmer Mapper
- No DOI given, and none found for title: AHash: A Load-Balanced One Permutation Hash

❌ MISSING DOIs

- None

❌ INVALID DOIs

- None