[REVIEW]: dms2dfe: Comprehensive Workflow for Analysis of Deep Mutational Scanning Data

whedon commented 7 years ago

Submitting author: @rraadd88 (Rohan Dandage) Repository: https://github.com/kc-lab/dms2dfe Version: v1.0.6 Editor: @tracykteal Reviewer: @afrubin Archive: 10.5281/zenodo.1095257

Status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/f1a6726384232ad95bf3566ed95b6142"><img src="http://joss.theoj.org/papers/f1a6726384232ad95bf3566ed95b6142/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/f1a6726384232ad95bf3566ed95b6142/status.svg)](http://joss.theoj.org/papers/f1a6726384232ad95bf3566ed95b6142)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer questions

@afrubin, please carry out your review in this issue by updating the checklist below (please make sure you're logged in to GitHub). The reviewer guidelines are available here: http://joss.theoj.org/about#reviewer_guidelines. Any questions/concerns please let @tracykteal know.

Conflict of interest

[x] As the reviewer I confirm that there are no conflicts of interest for me to review this work (such as being a major contributor to the software).

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Version: Does the release version given match the GitHub release (v1.0.6)?
[x] Authorship: Has the submitting author (@rraadd88) made major contributions to the software?

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: Have any performance claims of the software been confirmed?

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Authors: Does the paper.md file include a list of authors with their affiliations?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

whedon commented 7 years ago

Hello human, I'm @whedon. I'm here to help you with some common editorial tasks for JOSS. @afrubin it looks like you're currently assigned as the reviewer for this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As as reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all JOSS reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

arfon commented 7 years ago

@afrubin - please carry out your review in this issue by updating the checklist above and giving feedback in this issue. The reviewer guidelines are available here: http://joss.theoj.org/about#reviewer_guidelines

Any questions/concerns please let @tracykteal know.

afrubin commented 7 years ago

@rraadd88 The issue tracker in the https://github.com/kc-lab/dms2dfe repository is not enabled, so it's not currently possible for me to add my comments. Please let me know once it's ready.

rraadd88 commented 7 years ago

@afrubin fixed it. The issue tracker wasn't enabled by default because the submitted repository is actually a fork of my personal repository (https://github.com/rraadd88/dms2dfe).

afrubin commented 7 years ago

@rraadd88

I’ve installed the program into a new Debian 9 VM and detailed my feeback here: https://github.com/kc-lab/dms2dfe/issues/3

Unfortunatly, I was unable to run any of the three example datasets, due to a missing file, and raised an issue in the example dataset repository (https://github.com/rraadd88/ms_datasets/issues/1). Because I was unable to run the examples, my feedback here is going to focus on the manuscript and documentation.

The manuscript as currently written needs some additional context. There are a lot of different deep mutational scanning experimental designs, only some of which can be analyzed using this program. Right now, these details must be extracted from the I/O page in the documentation but a higher-level summary would be very useful. Additionally, the manuscript should mention and cite related tools (Enrich, EMPIRIC, dmstools, Enrich2, etc.) and provide a bit of information about how dms2dfe differs and the situations in which a user would want to choose dms2dfe.

Many citations are absent from the manuscript. The example datasets are referenced in the documentation, but they should also be cited in the manuscript so proper credit is given. Of the dependencies used by dms2dfe only pandas is cited in the manuscript. The dependencies should all be cited, especially those that have been published in indexed journals. The documentation pages for the various packages and programs should have instructions for citing them.

The documentation would benefit tremendously from a workflow-type diagram showing the different steps of the program, when each of the dependencies are used, which steps are required or optional, etc. Right now this information must be pieced together from other documentation pages and is difficult to follow.

I ran the tests using pytest test/test_configure.py and the single test passed, although I think that all it does is create an empty project directory? This is not much of a test in itself, and the lack of module-level tests will limit future development.

Please let me know when the issue with the example datasets is resolved and I’ll run the program again and provide additional feedback as appropriate.

I’ve raised two other small issues but these should all be quick to fix:

rraadd88 commented 7 years ago

@afrubin,

Deploying the package onto Pypi has now simplified the installation and fixed the related issues kc-lab/dms2dfe#1,2 and 3. I have addressed the issue with datasets (rraadd88/ms_datasets#1) and successfully tested all three datasets.

Regarding the manuscript, I have now added the required information about compatible experimental design (parwise analysis), citations to other analysis tools, datasets and major dependencies (link).
Regarding the documentation, as suggested, I have added the much needed workflow-type diagram (link). The diagram shows the dependencies used at each step and depending on input data format, which steps would be executed. Also I have added a list of citations of all the dependencies used by the tool (link).
Regarding the testing, test/test_configure.py serves as a quick test of the build and general installation of dependencies on travis CI. I have now created new script test/test_datasets.py that would run a representative dataset (Olson_et_al_2014). I have also added all the steps needed in the overall testing (installation and analysis of datasets) in the documentation itself (link).

Thank you for all the suggestions. happy reviewing.

afrubin commented 7 years ago

@rraadd88

The installation is much simpler now, but I encountered an error in Biopython when following the instructions. I'm not sure why this works under Travis but not on my installation. Hopefully it is easy for you to figure out what's going on. I'll run the examples once this is resolved. See here: https://github.com/kc-lab/dms2dfe/issues/3

The manuscript looks much better now. It now includes most required references, although the manuscript should cite everything in the documentation's list of citations. Also, I suggest adding a sentence about PDB visualization and citing UCSF Chimera in both the manuscript and the documentation.

There are still some typos/spelling errors (e.g. "Documentarion" instead of "Documentation" at the start of the last sentence), so I recommend running it through some a spelling/grammar checker and possibly getting a third party to proofread it for you.

rraadd88 commented 7 years ago

@afrubin , the Biopython error is quite strange for me and technically it should not occur at all. I have detailed the reasons and best possible solutions in kc-lab/dms2dfe#3. Also, I have updated the manuscript with new citations and corrected the spelling errors (link). Please have a look.

Also, I just noticed that multiple items such as Installation, Repository etc. can be ticked the checklist (at the top of the page). (@tracykteal).

afrubin commented 7 years ago

@rraadd88 I was able to work around the Biopython installation issue by using Anaconda instead of pip (see https://github.com/kc-lab/dms2dfe/issues/3).

I'm still trying to get the example to work. I've run into a new issue with the dependency programs, which may not be getting built properly (see https://github.com/kc-lab/dms2dfe/issues/4).

The PDF looks complete now that the references have been added. I've updated the checklist.

rraadd88 commented 7 years ago

@afrubin, unlike Ubuntu, apparantly, Debian OS does not provide basic dependencies out of the box. So, I have recommended users to install required stuff prior to installation of dms2dfe. After testing on Debian OS, hopefully, now, the issue (kc-lab/dms2dfe#4) would be resolved.

The updated version (1.0.6.4) can be installed by pip install dms2dfe --upgrade. I hope that the datasets can now be analysed successfully (link: testing steps).

Thank you for the updation of the checklist!

afrubin commented 7 years ago

I have successfully re-run the examples from a fresh VM again to test the installation process, and it runs much more smoothly now. The conda environment in particular is helpful. I have closed the previous issues about installation, as the installation now works as described.

I have opened a few small issues: https://github.com/kc-lab/dms2dfe/issues/5 https://github.com/kc-lab/dms2dfe/issues/6 https://github.com/kc-lab/dms2dfe/issues/7 https://github.com/rraadd88/ms_datasets/issues/2

I am still a bit unclear about what the CI testing does, since it doesn't appear the repo has unit tests or consistency tests.

Since the testing script embedded in the documentation only runs the example datasets, it will only reveal errors that crash the program. The documentation mentions that the program output can be compared to the pre-generated output files in 'ms_datasets/outputs', but this comparison should be done automatically on at least one dataset (or partial dataset) as part of a testing script included in the repo.

Many of the functions in the API section of the documentation do not have a docstring or parameter descriptions, so some more work is needed here.

I didn't find community guidelines for contributors. It is fine that users are directed towards the issue tracker to get support or report problems.

A little bit more motivation in the paper for when someone would want to use dms2dfe instead of the other tools would be useful but not required.

I have updated the checklist now that I can see the program is functional using the example datasets. Since no specific performance claims were made, I think having a program that works as described is sufficient.

rraadd88 commented 6 years ago

Hi @afrubin, many thanks for the conda environment suggestions earlier. This has really solved the earlier issues regarding the installation.

I have fixed the new small issues. Please have a look: https://github.com/kc-lab/dms2dfe/issues/5 , https://github.com/kc-lab/dms2dfe/issues/6 , https://github.com/kc-lab/dms2dfe/issues/7 and https://github.com/rraadd88/ms_datasets/issues/2 . Thanks a lot for pointing them out.

Regarding CI, automatically comparing output files would be indeed really helpful in auto-testing the functionality of the software. So, I have implemented that. Here's a link to the log of CI showing testing of a representative dataset (Olson_et_al_2014) and comparison of the analysed files (located in ms_datasets/analysis) with pre-analysed files (located in ms_datasets/outputs) : https://travis-ci.org/rraadd88/dms2dfe#L1862.

Also, I have now added docstrings to all the functions and files in the package. Link to API page: http://kc-lab.github.io/dms2dfe/latest/html/5api.html

I have also added a page with comprehensive "Community guidelines" for contributors: http://kc-lab.github.io/dms2dfe/latest/html/6contributors.html

Thanks for all the suggestions and updates in the checklist!

afrubin commented 6 years ago

@rraadd88 Looks good to me. I have closed these most recent issues.

@tracykteal @arfon Based on my assessment, the project now meets minimum requirements to be accepted.

arfon commented 6 years ago

@tracykteal @arfon Based on my assessment, the project now meets minimum requirements to be accepted.

Great, thanks!

@rraadd88 - At this point could you make an archive of the reviewed software in Zenodo/figshare/other service and update this thread with the DOI of the archive? I can then move forward with accepting the submission.

rraadd88 commented 6 years ago

@arfon, I have archived it using zenodo.

DOI: 10.5281/zenodo.1095257

arfon commented 6 years ago

@whedon set 10.5281/zenodo.1095257 as archive

whedon commented 6 years ago

OK. 10.5281/zenodo.1095257 is the archive.

arfon commented 6 years ago

@afrubin - many thanks for your review here and to @tracykteal for editing this submission.

@rraadd88 - your paper is now accepted into JOSS and your DOI is https://doi.org/10.21105/joss.00362 ⚡️ 🚀 💥

whedon commented 6 years ago

:tada::tada::tada: Congratulations on your paper acceptance! :tada::tada::tada:

If you would like to include a link to your paper from your README use the following code snippet:

[![DOI](http://joss.theoj.org/papers/10.21105/joss.00362/status.svg)](https://doi.org/10.21105/joss.00362)

This is how it will look in your documentation:

We need your help!

Journal of Open Source Software is a community-run journal and relies upon volunteer effort. If you'd like to support us please consider volunteering to review for us sometime in the future. You can add your name to the reviewer list here: http://joss.theoj.org/reviewer-signup.html

rraadd88 commented 6 years ago

@afrubin - thank you very much for the careful review and nice suggestions. @tracykteal - thank you for editing the submission and @arfon (@whedon) - thanks for the streamlined publishing.

openjournals / joss-reviews