[REVIEW]: kmeRs : K-Mers Similarity Score Matrix

whedon commented 5 years ago

Submitting author: @RafalUrniaz (Rafal Urniaz) Repository: https://github.com/RafalUrniaz/kmeRs Version: 1.1.0 Editor: @csoneson Reviewer: @jlincbio, @VivekTodur Archive: Pending

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/526c6ebaa0efd0716f08b77a79a3e50a"><img src="https://joss.theoj.org/papers/526c6ebaa0efd0716f08b77a79a3e50a/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/526c6ebaa0efd0716f08b77a79a3e50a/status.svg)](https://joss.theoj.org/papers/526c6ebaa0efd0716f08b77a79a3e50a)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@jlincbio & @VivekTodur, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @csoneson know.

✨ Please try and complete your review in the next two weeks ✨

Review checklist for @jlincbio

Conflict of interest

[x] As the reviewer I confirm that I have read the JOSS conflict of interest policy and that there are no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@RafalUrniaz) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

[ ] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[ ] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[ ] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[ ] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[ ] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[ ] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[ ] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[ ] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[ ] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @VivekTodur

Conflict of interest

[x] As the reviewer I confirm that I have read the JOSS conflict of interest policy and that there are no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@RafalUrniaz) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[ ] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[ ] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[ ] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[ ] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[ ] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[ ] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[ ] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

whedon commented 5 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @jlincbio, @VivekTodur it looks like you're currently assigned to review this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

whedon commented 5 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

csoneson commented 5 years ago

@jlincbio, @VivekTodur - this is where the review happens! Instructions are available in the post above - don't hesitate to ping me if you have questions!

csoneson commented 5 years ago

@RafalUrniaz - please take a look at the checklists above and make sure that all required items are available in your submission. If not, you can already add them now, to facilitate the review process.

csoneson commented 5 years ago

@whedon check references

whedon commented 5 years ago

Attempting to check references...

whedon commented 5 years ago


OK DOIs

- 10.1016/0022-2836(70)90057-4 is OK
- 10.1016/0022-2836(81)90087-5 is OK
- 10.1186/1472-6785-11-11 is OK

MISSING DOIs

- None

INVALID DOIs

- None

jlincbio commented 5 years ago

I have some comments about the software and documentation - can I just paste them here?

jlincbio commented 5 years ago

@csoneson, @RafalUrniaz: here are some of my comments so far:

Functionality

Installation

~~The devtools package in R is not typically installed by default;~~ I would also suggest users to install via a tarball release using R CMD INSTALL ~~instead, or add a comment about devtools.~~
I am not sure where this comment really should go, but the two dependencies tcR and rDNAse require way too many other dependencies that are unrelated to the core function of this package. Looking at the R code it seems like only two functions rDNAse::twoSeqSim() and tCR::generate.kmers() are invoked, and neither of which require calls to features not included in core R and Biostrings; perhaps incorporate these functions into the package and cite the two packages instead? Sources: rDNAse::twoSeqSim and tcR::generate.kmers
I got an error on a fresh Ubuntu 18.04 LTS/R Open 3.5.3 installation while installing the tcR dependency igraph about missing gfortran; this is more or less carelessness on my end but also a side effect of requiring unnecessary dependencies.

Documentation

Statement of need

While the documentation describes the purpose of the tool as to "calculate similarity score matrix for DNA k-mers," there is no description on what problems this tool is designed to solve or the target audience.

Functionality documentation

There is no documentation or tutorial on how to export results from the R package to use with the companion Python program (heatmap4kmers), even though Figure 1 is actually generated from the Python program and the R package includes an export function; from my perspective it seems like a core feature that is left undocumented.

Example usage

The documentation includes a clear walkthrough, but it would be nice to have a more real-world scenario use case than GATTACA.

Community guidelines

This is missing in the documentation but can be fixed relatively quickly.

Software paper

Summary

since the introduction discusses a layman’s description of DNA, so technically it satisfies the requirement that it describes the tool to a "diverse, non-specialist audience"; however, it needs a little more discussion on the purpose of this tool in the introduction section in my opinion.

Statement of need

The paper describes the purpose behind creating such a tool to generate similarity score matrices but does not describe the sort of research problem this tool can help address; while similarity scores are calculated to compare sequences, from time to time this can be done without using this tool, so I suggest that the paper should try to address the purpose of this tool better. Since this is technically an academic paper, it should adhere to JOSS submission requirements that require that "the software should have an obvious research application" and "be a significant contribution to the available open source software."

State of the field

This is entirely missing in the paper. A discussion on why someone would bother to calculate these scores to compare sequence similarities would be a starting point to illustrate the purpose of this tool.

Quality of writing

It is a bit unclear as it stands what the purpose and use cases of this tool are. The manuscript describes what some the basic keywords mean, but fails to address the scientific idea. For instance, the author mentions a list of PAM and BLOSUM matrices frequently used in bioinformatics, and while the different use cases for these matrices are too fine and beyond the scope of this manuscript, merely listing them and not discussing what they are clutters the writing.

References

The paper cities Wikipedia and PLoSWiki entries; this I find a bit problematic for archival purposes. Additionally, the NCBI citations are not archived either.

Miscellaneous

Personally I would like to see the functionality of plotting the similarity heatmap (Figure 1) be incorporated into the R package, either via a command call to invoke the Python software or native R code (ideally, either via the native heatmap() function or ggplots (one of the dependencies installed by tcR), seeing how the paper includes this figure and in a way counts the ability to generate data for this tool as part of the feature set. Without this, the package seems a bit bare bones, since it can be construed as a wrapper for Biostrings::pairwiseAlignment; this technically goes against the requirement that minor utility packages being unacceptable for submission.
The tutorial uses BLOSUM62 with DNA sequences; in this situation I suggest using amino acid sequences instead.

csoneson commented 5 years ago

Hi all, just checking in on how the reviews are progressing here - @jlincbio, thanks for your comments above! @VivekTodur - do you think you will have the chance to provide your review in the coming week? Thanks!

jlincbio commented 5 years ago

@csoneson thanks for checking in. Please let me know when I need to follow up. I have a few things on the agenda for the next two weeks but quick checks are not a problem.

VivekTodur commented 5 years ago

Yes, I am almost done with the review. I can revert back by this Sunday...

Thanks

On Thu, 3 Oct 2019, 12:27 am Charlotte Soneson, notifications@github.com wrote:

Hi all, just checking in on how the reviews are progressing here - @jlincbio https://github.com/jlincbio, thanks for your comments above! @VivekTodur https://github.com/VivekTodur - do you think you will have the chance to provide your review in the coming week? Thanks!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openjournals/joss-reviews/issues/1744?email_source=notifications&email_token=AC4CFXQCYXXGTJFACGSMGVTQMTVHXA5CNFSM4IXZM3LKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAFZ4AY#issuecomment-537632259, or mute the thread https://github.com/notifications/unsubscribe-auth/AC4CFXSIQE3WOE45VAEY23TQMTVHXANCNFSM4IXZM3LA .

csoneson commented 5 years ago

Yes, I am almost done with the review. I can revert back by this Sunday... Thanks

@VivekTodur - any updates on your review?

VivekTodur commented 5 years ago

@csoneson Thanks for the opportunity to review this manuscript and my apologies for the delay. Below are my comments,

Functionality Installation Installation is quick and simple as described in the documentation on Ubuntu 19.04. Shouldn't be a problem in other versions of Linux computers that are prepared for biological sequence analysis.

Documentation Statement of need Manual for the tool "kmeRs: K-Mers Similarity Score Matrix" is not clearly describing the general purpose and requirement of the tool, A quick note is required.

Functionality documentation Various functions of the tool are neatly illustrated with relevant examples, but it lacks documentation on how to export the results and it's output format. Also, manuscript containings Kmer Heatmap image generated from heatmap4kmers, A quick note on this is very useful.

Example usage The documentation includes detailed instructions on the How-To-Use. As mentioned earlier output export method is missing

Community guidelines General guidelines are missing, should be added.

Software paper Summary The tool is developed to address problems related to biological sequence similarity searches. In this, very focused context author has neatly explained the purpose.

Statement of need Paper titled "K-Mers Similarity Score Matrix" describes the purpose and requirement of the tool in layman terms. The author needs to describe the research object, a real-world requirement and how kmeRs is outperforming the other tools.

State of the field An author should include this section as it is missing. Also, it would be nice to see a quick note on how this tool can be integrated into existing other pipelines.

Quality of writing There are plenty of tools/algorithms already available to solve the sequence similarity search problem. The author needs to explain the exact purpose of kmeRs, and how it is different from the existing one in terms of functionalities. A computational benchmark [CPU vs Memory] would add more value to the manuscript. Empirical formula to calculate KmeR Score is also missing in the manuscript.

References Considering this is software paper, including citations from the scientific journal, are sufficient. Those Wikipedia citations can be omitted.

csoneson commented 4 years ago

@VivekTodur - thanks for your review! @RafalUrniaz - please go through the comments from the reviewers above, and report back here when you are ready for them to take another look. Of course, don't hesitate to ping me or the reviewers if you have questions.

csoneson commented 4 years ago

Hi @RafalUrniaz - I just wanted to make sure that you have seen the comments above from the reviewers, and check whether you would have an estimate of when you might be able to have a new version for them to consider. Please let us know if you have questions.

csoneson commented 4 years ago

Ping @RafalUrniaz. Could you please let us know whether you are working on a revision of your submission? If you expect that it will take a long time to get back to this, also please let us know, and we can pause this issue. Thanks.

csoneson commented 4 years ago

@RafalUrniaz - I'm going to pause this for now. Please let us know when you have an updated version and are ready for the review to resume.

danielskatz commented 4 years ago

👋 @csoneson - since it looks like you haven't heard back from the author, you may need to ping them via email at this point

csoneson commented 4 years ago

I have been in contact with @RafalUrniaz via email, and he would like to withdraw this submission since he will not have time to get back to it in the near future.

@openjournals/joss-eics - I will close this issue and add the 'withdrawn' label, is there anything else that should be done from my side?

@jlincbio, @VivekTodur - thanks a lot for your reviews, really appreciate your time and input.

labarba commented 4 years ago

We have to withdraw the submission on the AEIC interface, which I just did now.

jlincbio commented 4 years ago

@csoneson Thanks for the notice. If there are other manuscripts needing review please do not hesitate to let me know.

openjournals / joss-reviews