[REVIEW]: ASCENDS: Advanced data SCiENce toolkit for Non-Data Scientists

whedon commented 5 years ago

Submitting author: @ornlpmcp (Sangkeun Lee) Repository: https://github.com/ornlpmcp/ASCENDS Version: 0.4.1 Editor: @terrytangyuan Reviewer: @zhampel, @jrbourbeau Archive: 10.5281/zenodo.3635782

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/f3d4ec2c49f2c40b5fe1f24c61556ba0"><img src="https://joss.theoj.org/papers/f3d4ec2c49f2c40b5fe1f24c61556ba0/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/f3d4ec2c49f2c40b5fe1f24c61556ba0/status.svg)](https://joss.theoj.org/papers/f3d4ec2c49f2c40b5fe1f24c61556ba0)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@zhampel & @jrbourbeau, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @terrytangyuan know.

✨ Please try and complete your review in the next two weeks ✨

Review checklist for @zhampel

Conflict of interest

[x] As the reviewer I confirm that I have read the JOSS conflict of interest policy and that there are no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Version: 0.4.1
[x] Authorship: Has the submitting author (@ornlpmcp) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Authors: Does the paper.md file include a list of authors with their affiliations?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

Review checklist for @jrbourbeau

Conflict of interest

[x] As the reviewer I confirm that I have read the JOSS conflict of interest policy and that there are no conflicts of interest for me to review this work.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Version: 0.4.1
[x] Authorship: Has the submitting author (@ornlpmcp) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the function of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Authors: Does the paper.md file include a list of authors with their affiliations?
[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

whedon commented 5 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @zhampel, @jrbourbeau it looks like you're currently assigned to review this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

whedon commented 5 years ago

Attempting PDF compilation. Reticulating splines etc...

whedon commented 5 years ago

PDF failed to compile for issue #1656 with the following error:

ORCID looks to be the wrong length /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-80a4e9edb5a6/lib/whedon.rb:143:in block in check_orcids': Problem with ORCID (orcid.org/0000-0002-1317-5112) for Sangkeun Lee (RuntimeError) from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-80a4e9edb5a6/lib/whedon.rb:141:ineach' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-80a4e9edb5a6/lib/whedon.rb:141:in check_orcids' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-80a4e9edb5a6/lib/whedon.rb:88:ininitialize' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-80a4e9edb5a6/lib/whedon/processor.rb:36:in new' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-80a4e9edb5a6/lib/whedon/processor.rb:36:inset_paper' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-80a4e9edb5a6/bin/whedon:55:in prepare' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor/command.rb:27:inrun' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor/invocation.rb:126:in invoke_command' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor.rb:387:indispatch' from /app/vendor/bundle/ruby/2.4.0/gems/thor-0.20.3/lib/thor/base.rb:466:in start' from /app/vendor/bundle/ruby/2.4.0/bundler/gems/whedon-80a4e9edb5a6/bin/whedon:116:in<top (required)>' from /app/vendor/bundle/ruby/2.4.0/bin/whedon:23:in load' from /app/vendor/bundle/ruby/2.4.0/bin/whedon:23:in

'

terrytangyuan commented 5 years ago

@ornlpmcp Could you fix the compilation failure of the paper?

zhampel commented 5 years ago

@ornlpmcp Here is my initial review. I will await your reply to my comments to continue my review.

Paper

I think the paper needs further motivation to support your submission. Perhaps by describing why your package facilitates analysis either over existing packages or by highlighting its strengths. This would also help clarify the intended use case and audience. In addition to corrections to the text, I have made a few specific comments regarding this in the points below.

These success stories have been motivating scientists, who study physics, chemistry, materials, medicine and many more, —> These success stories have been motivating scientists who study physics, chemistry, materials, medicine and many other subjects,
Remove ‘However, there are barriers to overcome.’ and put ‘However’ at start of next sentence.
I think because you wrote ASCENDS as an acronym, all letters should be capitalized.
The phrases below do not seem to add to your motivation, as you have just argued that the difficulty in using big data tools requires no programming experience, and then you say that ASCENDS is a simple but power tool via CLI/GUI that requires no programming experience. You should strengthen your motivating arguments and make this more coherent. Most existing big data tools, systems, and methodologies have been developed for scientists (or any users) who have no… & Ascends is a toolkit that is developed to assist scientists (or any persons)…
Earlier versions of Ascends have been used for scientific researches —> Earlier versions of Ascends have been used for scientific research
For the bibliography file, there are no DOIs. Can you please include for the sources that have them. For example, the source kinney2014equitability has DOI: https://doi.org/10.1073/pnas.1309933111
The repository is located in the ORNLPMCP group’s account, and it is the sole contributor to the codebase. Thus, I can not confirm whether the listed authors can claim equal authorship to the submission.

Documentation

Unless I am mistaken, I see only a short markdown file that serves as an API reference. This file shows example usage, but it does not provide a proper reference to functionality per se. I recommend providing a more thorough, detailed reference for ASCEND functions.
I think with proper reference to ASCEND functions, it would become more clear whether a testing suite is necessary. The authors describe the codebase as a wrapper around Keras, TF, and scikit-learn, so depending on how they are ported also may influence any requisite testing functionality.

Installation

I followed the instructions using conda. Had to indeed install TF manually, as indicated. However, during pip install ascends-toolkit, I receive the following output, which so far hasn’t affected running of train.py.

ERROR: Complete output from command /home/zhampel/anaconda3/envs/ascends/bin/python -u -c 'import setuptools, tokenize;file='"'"'/tmp/pip-install-t40_0433/minepy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(file);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, file, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-t0o9dv8e --python-tag cp37: ERROR: running bdist_wheel running build running build_py creating build creating build/lib.linux-x86_64-3.7 creating build/lib.linux-x86_64-3.7/minepy copying minepy/init.py -> build/lib.linux-x86_64-3.7/minepy Fixing build/lib.linux-x86_64-3.7/minepy/init.py Skipping optional fixer: buffer Skipping optional fixer: idioms Skipping optional fixer: set_literal Skipping optional fixer: ws_comma Fixing build/lib.linux-x86_64-3.7/minepy/init.py Skipping optional fixer: buffer Skipping optional fixer: idioms Skipping optional fixer: set_literal Skipping optional fixer: ws_comma running build_ext Traceback (most recent call last): File "", line 1, in File "/tmp/pip-install-t40_0433/minepy/setup.py", line 55, in cmdclass = {'build_ext': build_ext_custom} File "/home/zhampel/anaconda3/envs/ascends/lib/python3.7/site-packages/setuptools/init.py", line 145, in setup return distutils.core.setup(**attrs) File "/home/zhampel/anaconda3/envs/ascends/lib/python3.7/distutils/core.py", line 148, in setup dist.run_commands() File "/home/zhampel/anaconda3/envs/ascends/lib/python3.7/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/home/zhampel/anaconda3/envs/ascends/lib/python3.7/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/zhampel/anaconda3/envs/ascends/lib/python3.7/site-packages/wheel/bdist_wheel.py", line 192, in run self.run_command('build') File "/home/zhampel/anaconda3/envs/ascends/lib/python3.7/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/home/zhampel/anaconda3/envs/ascends/lib/python3.7/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/home/zhampel/anaconda3/envs/ascends/lib/python3.7/distutils/command/build.py", line 135, in run self.run_command(cmd_name) File "/home/zhampel/anaconda3/envs/ascends/lib/python3.7/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/home/zhampel/anaconda3/envs/ascends/lib/python3.7/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/tmp/pip-install-t40_0433/minepy/setup.py", line 9, in run import numpy ModuleNotFoundError: No module named 'numpy'

ERROR: Failed building wheel for minepy

ornlpmcp commented 5 years ago

Dear @zhampel

I addressed most of your comments. The original text was misleading that there are many tools for scientists who are not programming experts. I edited the text as follows:

"However, most existing big data tools, systems, and methodologies have been developed for programming experts but not for scientists (or any users) who have no or little knowledge of programming. "

The main purpose of this tool is to provide a user-friendly interface (CLI/GUI) so that users can do machine learning without programming efforts. I addressed the specific comments as you directed.

Regarding the API document, the providing APIs is not the main purpose of this software; and we want to focus on providing CLI/GUI. Would it be still necessary to get the API document/testing ready? Please advise.

However, please guide us on how we can address the following comment: "the repository is located in the ORNLPMCP group’s account, and it is the sole contributor to the codebase. Thus, I can not confirm whether the listed authors can claim equal authorship to the submission." - We co-own the repository accounts as a team.

In terms of installation error, indeed we encountered the minepy error; but it seems that it's minepy packages problem. (pip install minepy has the same error) We added a note that this error can be ignored.

All the edits have been pushed to the repository. Please advise us. Thanks.

ornlpmcp commented 5 years ago

Dear @terrytangyuan @zhampel @whedon please advise us to make this move forward. Thanks!

terrytangyuan commented 5 years ago

@openjournals/joss-eics Any suggestions on this question below?

Please guide us on how we can address the following comment: "the repository is located in the ORNLPMCP group’s account, and it is the sole contributor to the codebase. Thus, I can not confirm whether the listed authors can claim equal authorship to the submission." - We co-own the repository accounts as a team.

labarba commented 5 years ago

Well, nothing can be done now to reconstruct the lost history in this project. But I want to strongly encourage the authors to adopt a traditional GitHub workflow: instead of using a @ornlpmcp group account (you are sharing a password?? ugh!), you would create a GitHub organization, to which you add the various individual researchers' GitHub usernames as collaborators. It is highly unusual for more than one person to share an account like this, and it breaks many of the GitHub features: you can't git blame to find who in your team made a particular change, you can't assign people to issues, or even know who posted on the issue tracker.

terrytangyuan commented 5 years ago

Thanks @labarba!

@jrbourbeau and @zhampel, could you take a look at @ornlpmcp's comments and continue reviewing this submission?

zhampel commented 5 years ago

@terrytangyuan Yes, I will get to it and respond by the end of the week.

zhampel commented 5 years ago

@ornlpmcp

Paper I still think there needs to be work done on the paper. I think some statements are too general

... so many success stories in a wide range of areas, especially in industry.

What kind of success/metric for success are you describing, and why is that relevant to your code? Are you purporting to offer the same success?

And some need better wording, for example,

Users still need to understand what they want to do but not necessarily how it works.

I think this kind of statement should be left out, especially given that the next sentence describes what the intention of ASCENDS is.

Usage I have been able to reproduce the training examples outlined in the README. However, I think that a complete classification example should be outlined in your api reference. That is, unless the calls are nearly identical to what has been shown in the regression example. Otherwise, I consider this an incomplete documentation of usage.

GUI I have not been able to reproduce the GUI as shown in the README. After running the ascends_server.py script, I merely get the following output:

$ python ascends_server.py
Using TensorFlow backend.

 * ASCENDS: Advanced data SCiENce toolkit for Non-Data Scientists
 * Web Server ver 0.1

 programmed by Matt Sangkeun Lee (lees4@ornl.gov)
 please go to : http://localhost:7777/

Yet, when I try to go to the localhost page, I simply get a This site can't be reached error message (ERR_CONNECTION_REFUSED). Is there anyway to debug this to test the GUI?

jrbourbeau commented 5 years ago

Apologies for the silence on my end, I'll submit review comments this weekend

Kevin-Mattheus-Moerman commented 5 years ago

@terrytangyuan can you follow up with the authors/reviewers to see where this stands? Thanks

terrytangyuan commented 5 years ago

@ornlpmcp Could you take a look at @zhampel's feedback above? In the meantime, 👋 @jrbourbeau We look forward to your review.

ornlpmcp commented 5 years ago

@zhampel

Thank you very much for your reviews. They are indeed very helpful.

[Paper] I modified the paper.md addressing your comments. I try to remove sentences that are too general and added some more examples.

[Usage]

The main purpose/contribution of this tool is to provide a user-friendly interface (CLI/GUI) so that users can do machine learning without programming efforts. So, it is not our main intention that users write codes using the APIs. In fact, it will be more efficient to use scikit-learn and keras directly. However, as you addressed, I added the API use case reference for both regression and classification so that users can use them in case they need. I checked in the api_reference.md and notebook/reference.ipynb for user reference.

[GUI] Unfortunately, I could not replicate the error. Would you please kindly test if the port 7777 is open or not conflicted. Do you see any error messages in the console?

Please advise. I am happy to move forward.

Dear. @jrbourbeau It would be great if you could review this work. I will respond as soon as I can. Big thanks, Thank you for helping us too @terrytangyuan

labarba commented 5 years ago

@terrytangyuan — it looks like this review needs some shepherding from you at this point

ornlpmcp commented 5 years ago

@terrytangyuan please help us to move forward. I am willing to revise the manuscript/code as advised.

kthyng commented 5 years ago

Hi @zhampel Please see comments above from @ornlpmcp in response to your review.

terrytangyuan commented 5 years ago

Just emailed the reviewers. Pinging @zhampel and @jrbourbeau here as well.

zhampel commented 5 years ago

@terrytangyuan I can review and update based on the authors' changes by the end of the week.

zhampel commented 5 years ago

@ornlpmcp

README

Can you please remove unnecessary dollar signs $ for examples of single lines? They useful when displaying full outputs based on runs, however, for single lines, they are unnecessary and a bit of a nuisance for copy/paste purposes ;)

There is a erroneous slash in the following line: Copy the following text into a new file and save it to data\iris_test_input.csv. Please change to Copy the following text into a new file and save it to data/iris_test_input.csv.

Web-Based GUI I no longer have the port issues I mentioned before. Thanks for addressing those.

I had to do manually pip install various packages for running the GUI: minepy, matplotlib, np_utils, tornado Should these not be simply installed with a reqs file or similar? This would preferred over getting No module named 'xx' a few times when running python ascends_server.py, and then pip install 'xx'

jrbourbeau commented 5 years ago

Apologies to all for dropping the ball on my review. I will work through the reviewer checklist and open issues in the ASCENDS repository with any recommended changes.

At present, the primary missing piece I see is the lack of tests (please correct me if I'm missing something @ornlpmcp). There should be some set of tests that we can run to verify that the software is running as expected. Ideally, but not required, it'd be nice if those tests were automatically run as part of a continuous integration suite.

ornlpmcp commented 5 years ago

@zhampel Thanks for your review. I updated the README file accordingly. Also, if you activated the ascends anaconda environment and if you have done installing via 'pip install ascends-toolkit' all of those packages should have been installed.

You need to install those packages separately if you run the server without activating the Anaconda environment. I added this explanation in the README file.

@jrbourbeau There is no 'testing' module; however I believe the Jupyter notebook file under /nootbook/reference.ipynb can be used for testing if the software is properly installed and run.

Thank you very much all. I would really appreciate it if you could help us move forward. @terrytangyuan

terrytangyuan commented 5 years ago

@ornlpmcp I think @jrbourbeau's point is that the software needs to be automatically and continuously tested in services like Travis CI or CircleCI whenever a new code change is introduced. A notebook would not be sufficient.

kthyng commented 5 years ago

@terrytangyuan Automatic/continuous testing is not a requirement — recommended but not required. Some way to verify that the code is running correctly is required. If a notebook is provided and the way to verify that the notebook results are correct is clearly included, then that should be sufficient.

terrytangyuan commented 5 years ago

@kthyng Thanks for the clarification. Sorry I only meant to clarify the reviewer's feedback but not to make this an requirement. In the meantime, @ornlpmcp it would be good to document the manual testing instructions.

ornlpmcp commented 5 years ago

@terrytangyuan The readme file includes the tutorial that users can follow step by step to verify if everything works correctly, I will also mention the jupyter notebook in the readme file so that users can use it for testing

jrbourbeau commented 4 years ago

Thanks @ornlpmcp, I've opened up a few issues in the ASCENDS issue tracker with my review comments and have provided a brief outline here. Please let me know if anything in my comments is unclear or if you have any additional questions

Software version

[ ] Publish a v0.4.0 release on GitHub (https://github.com/ornlpmcp/ASCENDS/issues/3)

Installation

[ ] I am getting a gcc compilation error when trying to build an extension for minepy (I think similar to what @zhampel ran into in https://github.com/openjournals/joss-reviews/issues/1656#issuecomment-522658224). In the installation instructions you say "Note: you may encounter 'failed building wheel for minepy' error, but you can ignore as long as you see the 'Successfully installed minepy' message", however I was not able to successfully install minepy with pip and had to instead install it using conda. Given the installation errors @zhampel and I have encountered, I'd recommend providing a conda environment yaml file to allow for installing the necessary dependencies using conda instead of pip (https://github.com/ornlpmcp/ASCENDS/issues/7)
[ ] Explicitly list package dependencies (https://github.com/ornlpmcp/ASCENDS/issues/4)