[REVIEW]: rGUIDANCE – alignment confidence score computation in R

whedon commented 5 years ago

Submitting author: @FranzKrah (Franz-Sebastian Krah) Repository: https://github.com/FranzKrah/rGUIDANCE Version: 1.0 Editor: @karthik Reviewer: @shaunpwilkinson Archive: 10.5281/zenodo.2654302

Status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/b8b70cb72a3ec8331353795e3f83cfca"><img src="http://joss.theoj.org/papers/b8b70cb72a3ec8331353795e3f83cfca/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/b8b70cb72a3ec8331353795e3f83cfca/status.svg)](http://joss.theoj.org/papers/b8b70cb72a3ec8331353795e3f83cfca)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@shaunpwilkinson , please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.theoj.org/about#reviewer_guidelines. Any questions/concerns please let @karthik know.

✨ Please try and complete your review in the next two weeks ✨

Review checklist for @shaunpwilkinson

Conflict of interest

[x] As the reviewer I confirm that I have read the JOSS conflict of interest policy and that there are no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Version: Does the release version given match the GitHub release (1.0)?
[x] Authorship: Has the submitting author (@FranzKrah) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Authors: Does the paper.md file include a list of authors with their affiliations?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

whedon commented 5 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @shaunpwilkinson it looks like you're currently assigned as the reviewer for this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

whedon commented 5 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

karthik commented 5 years ago

@shaunpwilkinson Please follow instructions here and work through the checklist. Let me know if you have any questions. You can use Whedon to generate a new pdf anytime with @whedon generate pdf or check references with @whedon check references

karthik commented 5 years ago

@whedon check references

whedon commented 5 years ago

Attempting to check references...

whedon commented 5 years ago


OK DOIs

- 10.1111/j.1461-0248.2009.01314.x is OK
- 10.1186/1471-2105-7-471 is OK
- 10.1371/journal.pone.0018093 is OK
- 10.1093/sysbio/syv033 is OK
- 10.1093/bioinformatics/btu033 is OK
- 10.1093/molbev/msq066 is OK
- 10.1093/nar/gkv318 is OK
- 10.1142/9789812776136_0003 is OK
- 10.1186/s12862-018-1229-7 is OK
- 10.1016/j.patrec.2005.10.010 is OK
- 10.1093/bioinformatics/bti623 is OK

MISSING DOIs

- None

INVALID DOIs

- None

FranzKrah commented 5 years ago

I checked the proof version and I am not sure why but the Figure captions are not there... Franz

karthik commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

karthik commented 5 years ago

Franz: Thanks for catching that. Your figure captions render locally for me but we'll work to get this sorted out before publication.

labarba commented 5 years ago

@shaunpwilkinson 👋 — I see we have no checked items yet on your checklist. When do you think you might get to this review? Thanks!

shaunpwilkinson commented 5 years ago

Hi @labarba very sorry about the delay on this, I'm back now and working on this review, will have it done either this weekend or very early next week. Thanks for your patience, and apologies again - S

karthik commented 5 years ago

Thank you @shaunpwilkinson!

shaunpwilkinson commented 5 years ago

This is a very helpful package that fills a critical gap in the R workflow for phylogenetic analysis. The accompanying paper is clear and well written, and an example is included demonstrating the use of the package to improve the resolution of a phylogeny by factoring in column-wise alignment uncertainty. The tutorial needs some refining and additional notes to help users through the process without encountering issues, particularly when calling the third-party executables (outlined below). The example also involves downloading sequences from NCBI, which means that a set of sequences downloaded may change from one day to the next. This caused me a few issues and required some troubleshooting. For example, the outgroup Helvella aestivalis was not included in the returned sequence set when I ran the script, so the raxml calls produced an error. Also the names of the alignment caused raxml to fail with a rather enigmatic error message (it would be worth noting in the tutorial that the sequence names need to be truncated, not include special metacharacters, and all be unique). To avoid these issues I would suggest either including a pre-downloaded sequence data file with the package or use an existing sequence dataset such as the woodmouse alignment (ape package).

There is also a vital line missing from the example script, run(wd = wd) should be added before all_clusters <- read_phylota(wd).

Some other minor issues are included below:

The full list of dependencies should be included in the tutorial, along with links to the various repositories, optionally some basic installation instructions for Windows, Mac and Linux, and additional annotation/notes should be included to make it clear where paths to executables need to be modified. Some tips on finding the executables within the file system may also be helpful for some users.

The latest version of the ips package on CRAN is v0.0.7, but the rGUIDANCE package depends on v0.0.11. Please either add devtools::install_github("heibl/ips") to the example script or include the ips github repo in the ‘Remotes’ field of the DESCRIPTION file.

The example script includes local directories that won’t apply for other users – e.g. wd <- 'Documents/PhD/proj/low_priority/phylotaR/helvella'. I would advise just using '[path-to-directory]' or something similar. Alternatively rOpenSci may have a standard for setting working directory in example scripts. @karthik may be able to advise on this?

Consider changing the number of threads in the example to something more manageable for non-server users. I would advise changing the ncore argument to 1 for the purposes of the tutorial.

str_extract needs the stringr package installed; this should be added to the example script. In this and other calls it would be safer to use the double-colon operator to specify the package, since the function called may depend on the order that the various packages were loaded. I would also specify the full method where applicable too, for example use w <- phylogram::as.dendrogram.phylo(w) instead of w <- as.dendrogram(w).

On a final note, I found the manuscript provided a really helpful and intuitive guide to using R for phylogenetic analysis. In case of interest (and at the risk of a plug) the aphid package also does multiple sequence alignment within R and doesn't require any third party programs.

I hope this helps - thanks for the opportunity to review. I'm looking forward to including this package in my own workflows.

All the best, Shaun

karthik commented 5 years ago

Thanks @shaunpwilkinson! @FranzKrah Can you please make these fixes and respond here? I opened a couple of issues on your repo.

FranzKrah commented 5 years ago

Hi Karthik,

I copy-pasted Shauns comments below and answered to each. We addressed all the comments, which improved the package! Thanks!

Cheers, Franz

This is a very helpful package that fills a critical gap in the R workflow for phylogenetic analysis. The accompanying paper is clear and well written, and an example is included demonstrating the use of the package to improve the resolution of a phylogeny by factoring in column-wise alignment uncertainty. The tutorial needs some refining and additional notes to help users through the process without encountering issues, particularly when calling the third-party executables (outlined below). The example also involves downloading sequences from NCBI, which means that a set of sequences downloaded may change from one day to the next. This caused me a few issues and required some troubleshooting. For example, the outgroup Helvella aestivalis was not included in the returned sequence set when I ran the script, so the raxml calls produced an error. Also the names of the alignment caused raxml to fail with a rather enigmatic error message (it would be worth noting in the tutorial that the sequence names need to be truncated, not include special metacharacters, and all be unique). To avoid these issues I would suggest either including a pre-downloaded sequence data file with the package or use an existing sequence dataset such as the woodmouse alignment (apepackage). >> Thanks. We added a cautionary note at the beginning of the tutorial regarding the third-party executables. There we also provide URLs to the program websites where they can be downloaded as well. Further, it is correct that the code might run into trouble given that sequences are constantly updated on GenBank. Therefore, we revised the tutorial and provided a pre-downloaded sequence data file, so that users are able run guidance without errors.

There is also a vital line missing from the example script, run(wd = wd) should be added before all_clusters <- read_phylota(wd).

>> Thanks. We added this line

Some other minor issues are included below:

The full list of dependencies should be included in the tutorial, along with links to the various repositories, optionally some basic installation instructions for Windows, Mac and Linux, and additional annotation/notes should be included to make it clear where paths to executables need to be modified. Some tips on finding the executables within the file system may also be helpful for some users. >> Thanks. We added URLs to the external programs and shortly introduced to the used R packages (the most important ones). We also added information on what executables are and how to specify them. We also included URLs to tutorials how to find executables on Windows or Mac.

The latest version of the ips package on CRAN is v0.0.7, but the rGUIDANCE package depends on v0.0.11. Please either add devtools::install_github("heibl/ips") to the example script or include the ips github repo in the ‘Remotes’ field of the DESCRIPTION file. >> We added Remotes: github::heibl/ips to the DESCRIPTION file

The example script includes local directories that won’t apply for other users – e.g. wd <- 'Documents/PhD/proj/low_priority/phylotaR/helvella'. I would advise just using '[path-to-directory]' or something similar. Alternatively rOpenSci may have a standard for setting working directory in example scripts. @karthik may be able to advise on this? >> Sorry, we changed this to “[path-to-directory]”

Consider changing the number of threads in the example to something more manageable for non-server users. I would advise changing the ncore argument to 1 for the purposes of the tutorial. >> We changed this to 1 thread.

str_extract needs the stringr package installed; this should be added to the example script. In this and other calls it would be safer to use the double-colon operator to specify the package, since the function called may depend on the order that the various packages were loaded. I would also specify the full method where applicable too, for example use w <- phylogram::as.dendrogram.phylo(w) instead of w <- as.dendrogram(w). >> Thanks. We added the package name using the double-colon operator where applicable.

On a final note, I found the manuscript provided a really helpful and intuitive guide to using R for phylogenetic analysis. In case of interest (and at the risk of a plug) the aphid package also does multiple sequence alignment within R and doesn't require any third party programs. >> We looked into this and would be happy to include this in further versions. As far as we could see, it is currently not possible to implement aphid::align because it does not allow to specify a guide tree. This is however, necessary within the GUIDANCE algorithm. Maybe you can change this and we can include this in a next version.

karthik commented 5 years ago

@whedon check references

whedon commented 5 years ago

Attempting to check references...

whedon commented 5 years ago


OK DOIs

- 10.1111/j.1461-0248.2009.01314.x is OK
- 10.1186/1471-2105-7-471 is OK
- 10.1371/journal.pone.0018093 is OK
- 10.1093/sysbio/syv033 is OK
- 10.1093/bioinformatics/btu033 is OK
- 10.1093/molbev/msq066 is OK
- 10.1093/nar/gkv318 is OK
- 10.1142/9789812776136_0003 is OK
- 10.1186/s12862-018-1229-7 is OK
- 10.1016/j.patrec.2005.10.010 is OK
- 10.1093/bioinformatics/bti623 is OK

MISSING DOIs

- None

INVALID DOIs

- None

karthik commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

karthik commented 5 years ago

@FranzKrah The captions are there now (after the conclusions).

FranzKrah commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

karthik commented 5 years ago

Everything looks good to me. @shaunpwilkinson Did you skip checking off on the installation instructions list item for a good reason? If not please check that off and I can proceed to acceptance. 🙏

shaunpwilkinson commented 5 years ago

@karthik done

karthik commented 5 years ago

@whedon accept

whedon commented 5 years ago

No archive DOI set. Exiting...

karthik commented 5 years ago

Oops. @FranzKrah Can you please archive the package on Zenodo and post a DOI here?

FranzKrah commented 5 years ago

Sorry, wasn't aware of that...

10.5281/zenodo.2653906

karthik commented 5 years ago

@whedon set 10.5281/zenodo.2653906 as archive

whedon commented 5 years ago

OK. 10.5281/zenodo.2653906 is the archive.

karthik commented 5 years ago

@FranzKrah Almost there. Can you edit the metadata on Zenodo to reflect you as the author (rather than just your GitHub handle)? And while you're there, also check other fields (esp title) to make sure they are also correct.

FranzKrah commented 5 years ago

@karthik I am not sure how to do this. I tried changing the .zenodo.json file but got an error...

FranzKrah commented 5 years ago

@karthik Ok. My name is now changed in the metadata in Zenodo. I hope is what you meant...

karthik commented 5 years ago

@FranzKrah One more change (i.e. the title). It currently has your GitHub handle in the software title too. Can you fix that as well? Current citation:

Franz Sebastian Krah, & Christoph Heibl. (2019, April 30). FranzKrah/rGUIDANCE v1.0.8 (Version v1.0.8). Zenodo. http://doi.org/10.5281/zenodo.2654272

Otherwise the citation for the software archive will also have your Github handle rather than just the name of your software. See a correct example

FranzKrah commented 5 years ago

@karthik I changed the title but I cannot get rid of "FranzKrah/rGUIDANCE: ". I tried already to include a .zenodo.json file in the root of my GitHub repo but this creates errors in Zenodo...

karthik commented 5 years ago

@FranzKrah Can you just create a new Zenodo release from scratch? And make sure all the metadata look correct. Then I can update it here and proceed with accepting? Otherwise you'll have an incorrect citation.

FranzKrah commented 5 years ago

@karthik No I do not see how this would be possible. I think it is ok as is...

karthik commented 5 years ago

@whedon set 10.5281/zenodo.2654302 as archive

whedon commented 5 years ago

OK. 10.5281/zenodo.2654302 is the archive.

karthik commented 5 years ago

@FranzKrah When you go to the above linked DOI, click edit on the top right, then edit the metadata. Remove your GitHub user/repo from the title and save. Then it will generate a new DOI. You can post that here.

FranzKrah commented 5 years ago

@karthik I changed the title and saved but DOI is the same: 10.5281/zenodo.2654302

karthik commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

openjournals / joss-reviews