openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
717 stars 38 forks source link

[REVIEW]: SmartEDA: An R Package for Automated Exploratory Data Analysis #1509

Closed whedon closed 5 years ago

whedon commented 5 years ago

Submitting author: @sayanddude (Sayan Putatunda) Repository: https://github.com/daya6489/SmartEDA Version: 0.3.2 Editor: @mgymrek Reviewer: @nhejazi, @terrytangyuan Archive: 10.5281/zenodo.3383824

Status

status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/e56dad3d192cfeddd10fcc1550505ceb"><img src="http://joss.theoj.org/papers/e56dad3d192cfeddd10fcc1550505ceb/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/e56dad3d192cfeddd10fcc1550505ceb/status.svg)](http://joss.theoj.org/papers/e56dad3d192cfeddd10fcc1550505ceb)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@nhejazi & @terrytangyuan , please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @mgymrek know.

Please try and complete your review in the next two weeks

Review checklist for @nhejazi

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @terrytangyuan

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 5 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @nhejazi, @terrytangyuan it looks like you're currently assigned to review this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands
whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

terrytangyuan commented 5 years ago

I saw that you submitted for review to JSS already https://arxiv.org/pdf/1903.04754.pdf. Is this submission to JOSS still necessary?

sayanddude commented 5 years ago

@terrytangyuan Hi! We initially submitted the paper (given that it's a CRAN package) to JSS a couple of months back but it didn't work out there. We assure you that currently the paper is not under consideration for submission at any other journal or conference. We are hoping that this paper goes through the rigorous review and gets published at JOSS!

terrytangyuan commented 5 years ago

@sayanddude I see. There are many other tools for (automated) exploratory data analysis in R. What makes this package useful/different?

sayanddude commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

sayanddude commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

sayanddude commented 5 years ago

@sayanddude I see. There are many other tools for (automated) exploratory data analysis in R. What makes this package useful/different?

@terrytangyuan Hi! We have added a new section “Comparison with other R Packages” in the updated version of the paper pdf generated above. In this section we have given a snapshot of various capabilities of SmartEDA vs. some of the competing R packages (such as dlookr, explorer, DataExplorer, etc.) and have highlighted its advantages. Figure 9 in the paper gives a snapshot of the comparison and shows how it’s better than most of the available R packages for automated exploratory data analysis (Please find attached the figure below).

Fig9

To summarize, some of the key benefits of SmartEDA are: • No need remember the different R package names as SmartEDA has most of the exploratory function and dependencies • No need to write lengthy R scripts. SmartEDA does the exploratory in one line R script • It cuts down time for exploratory data analysis • SmartEDA has the extension of data.table to build customized summary statistics and cross tables • SmartEDA function can generate 100’s of ggplot (like scatter, bar, stacked bar, boxplot, density, qqplot, co-ordinate plots) at a time with customized theme using ggthemes package options

Also, SmartEDA is mentioned in the study conducted by Staniak and Biecek (2018) where they reviewed the landscape of R packages for automated Exploratory analysis. Some of the distinguishing features of SmartEDA pointed out by the authors when comparing it with other R packages are: • The SmartEDA package reports skewness and displays QQ plots against normal distribution • SmartEDA package provides a method of visualizing multivariate relationships - parallel coordinate plot. • SmartEDA give a reasonable insight into variables distributions and simple relationships. • Parallel Co-ordinates Plots (PCP) in SmartEDA is unique and is very well done. This paper by Staniak and Biecek (2019) is available in ArXiv- https://arxiv.org/pdf/1904.02101.pdf

Reference: Mateusz Staniak and Przemyslaw Biecek (2019), “The Landscape of R Packages for Automated Exploratory Data Analysis”, arXiv:1904.02101 [stat.CO]

sayanddude commented 5 years ago

👉 Check article proof 📄 👈

@nhejazi @terrytangyuan Requesting the reviewers to kindly consider the latest version of the pdf generated above (10.21105.joss.01509.pdf). We have added a section on “Comparison with other R Packages” and corrected some formatting issues that were there in the earlier version.

terrytangyuan commented 5 years ago

@sayanddude Thanks. The table looks great. Some feedback:

sayanddude commented 5 years ago

@terrytangyuan Thanks for the feedback! I will work on your comments and will get back to you with the updated paper and the required code changes as soon as possible.

labarba commented 5 years ago

@sayanddude — can you give us a status update? If you will need considerable more time, it would help if you let us know and we can add a "paused" label here.

sayanddude commented 5 years ago

@labarba - Hi! We are almost done addressing all the comments of the reviewer. We are now at the final stages of creating the unit test for the package. Please give us a couple of more days, we will update the code repository along with the updated paper by Tuesday (6th Aug, 2019) end of the day. Thanks!

sayanddude commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

sayanddude commented 5 years ago

@terrytangyuan We have worked on addressing all of your comments. Please find below our response to each action items:

I see lots of long functions with >150 lines of code and the code style is not consistent (I suggest running a lintr check). Similarly, in the roxygen docs, the style is not consistent. I see both ##' and #'. The indentation levels and the roxygen syntax are sometimes incorrect. Please double check. Thanks for your comment! We have updated the Github repository and we have ensured the length of almost all the functions are below 150 lines of code. We have also run a lintr check and have corrected all the issues related to style inconsistencies, indentation issues and incorrect syntaxes.

There isn't any unit test for the package. Thanks a lot for the comment! We have now implemented the unit test of the package (available at https://github.com/daya6489/SmartEDA).

Have you tried running this package on large datasets?. Yes, the package works well on large datasets. Recently, we applied the SmartEDA package on the Microsoft malware prediction data in Kaggle (available at- https://www.kaggle.com/ajithvallabai/microsoft-malware-prediction/data ). This dataset was considerably large i.e. it had 8900000 rows and 82 columns. The SmartEDA package worked seamlessly on this dataset.

Please make the paper more concise - I see many pages where each page only has one giant picture. Thanks for the comment! We have considerably reduced the size of the paper (please refer to the updated version of the paper i.e. 10.21105.joss.01509.pdf ) by removing a few verbose content and also by consolidating most of the images into a single figure i.e. Figure 2.

Thanks and Regards, Sayan

sayanddude commented 5 years ago

@labarba Hi! As discussed, we have completed working on the comments and have updated the Github code repository along with the paper.

labarba commented 5 years ago

The handling editor, @mgymrek, will take it from here.

mgymrek commented 5 years ago

@sayanddude thanks for making these changes

@nhejazi, @terrytangyuan can you now go over the revision? If your comments have been sufficiently addressed please finish filling out the checklist

nhejazi commented 5 years ago

Weighing in with a few points that I think ought to be addressed as we continue the review for the SmartEDA package.

sayanddude commented 5 years ago

@nhejazi Thanks for the feedback! I will work on your comments and will soon get back to you with the updated paper and other required changes (hopefully, by end of next week!).

sayanddude commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

PDF failed to compile for issue #1509 with the following error:

Error producing PDF. ! Undefined control sequence. l.371 environment such as R (\proglang

Looks like we failed to compile the PDF

sayanddude commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

sayanddude commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

sayanddude commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

sayanddude commented 5 years ago

@nhejazi We have worked on addressing all of your comments. Please find below our response to each action items:

Major (cannot be checked off yet)

Minor (things that will not hold up the review):

sayanddude commented 5 years ago

@mgymrek @terrytangyuan @nhejazi Hi! We have addressed all the comments mentioned above and have also made the required changes to the code repository. Kindly let us know if there's any pending action item from our end.

Thanks and Regards, Sayan

nhejazi commented 5 years ago

@sayanddude Thank you for quickly addressing our concerns, including the addition of the DOIs to all references (where possible). I've gone ahead and completed the reviewer checklist available to me and am ready to recommend the software paper for acceptance into JOSS. There may yet be other concerns to address but I do think the paper and R package are close.

sayanddude commented 5 years ago

@nhejazi Thank you so much!

mgymrek commented 5 years ago

Thanks!

@sayanddude see next steps below.

Some minor comments on the paper:

After fixing those typos, can you please make a zenodo archive, being sure the title and author list match those on the paper, and report the DOI here?

terrytangyuan commented 5 years ago

@sayanddude Thanks for addressing the comments. The paper looks good to me now and I recommend for publication.

sayanddude commented 5 years ago

@terrytangyuan Thank you so much!

sayanddude commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

sayanddude commented 5 years ago

Thanks!

@sayanddude see next steps below.

Some minor comments on the paper:

  • "EDA can be categorized into Descriptive statistical techniques and graphical techniques": uncapitalize descriptive
  • You can reference just Figure 2 rather than spelling out (a) through (f)
  • There is a reference to a Figure 9 that I believe should be to Figure 3 instead

After fixing those typos, can you please make a zenodo archive, being sure the title and author list match those on the paper, and report the DOI here?

@mgymrek Thanks for the feedback! I have corrected all the required typos in the updated version of the pdf. Also, I have created the zenodo archive for the SmartEDA package (version 0.3.2) with DOI: 10.5281/zenodo.3383824 and the URL is: https://doi.org/10.5281/zenodo.3383824.

I have ensured that the title and the author names are same as the ones mentioned in the paper. Please let me know if I have missed out on anything. Thank you so much!