Closed whedon closed 5 years ago
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @nhejazi, @terrytangyuan it looks like you're currently assigned to review this paper :tada:.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
Attempting PDF compilation. Reticulating splines etc...
I saw that you submitted for review to JSS already https://arxiv.org/pdf/1903.04754.pdf. Is this submission to JOSS still necessary?
@terrytangyuan Hi! We initially submitted the paper (given that it's a CRAN package) to JSS a couple of months back but it didn't work out there. We assure you that currently the paper is not under consideration for submission at any other journal or conference. We are hoping that this paper goes through the rigorous review and gets published at JOSS!
@sayanddude I see. There are many other tools for (automated) exploratory data analysis in R. What makes this package useful/different?
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
@sayanddude I see. There are many other tools for (automated) exploratory data analysis in R. What makes this package useful/different?
@terrytangyuan Hi! We have added a new section “Comparison with other R Packages” in the updated version of the paper pdf generated above. In this section we have given a snapshot of various capabilities of SmartEDA vs. some of the competing R packages (such as dlookr, explorer, DataExplorer, etc.) and have highlighted its advantages. Figure 9 in the paper gives a snapshot of the comparison and shows how it’s better than most of the available R packages for automated exploratory data analysis (Please find attached the figure below).
To summarize, some of the key benefits of SmartEDA are: • No need remember the different R package names as SmartEDA has most of the exploratory function and dependencies • No need to write lengthy R scripts. SmartEDA does the exploratory in one line R script • It cuts down time for exploratory data analysis • SmartEDA has the extension of data.table to build customized summary statistics and cross tables • SmartEDA function can generate 100’s of ggplot (like scatter, bar, stacked bar, boxplot, density, qqplot, co-ordinate plots) at a time with customized theme using ggthemes package options
Also, SmartEDA is mentioned in the study conducted by Staniak and Biecek (2018) where they reviewed the landscape of R packages for automated Exploratory analysis. Some of the distinguishing features of SmartEDA pointed out by the authors when comparing it with other R packages are: • The SmartEDA package reports skewness and displays QQ plots against normal distribution • SmartEDA package provides a method of visualizing multivariate relationships - parallel coordinate plot. • SmartEDA give a reasonable insight into variables distributions and simple relationships. • Parallel Co-ordinates Plots (PCP) in SmartEDA is unique and is very well done. This paper by Staniak and Biecek (2019) is available in ArXiv- https://arxiv.org/pdf/1904.02101.pdf
Reference: Mateusz Staniak and Przemyslaw Biecek (2019), “The Landscape of R Packages for Automated Exploratory Data Analysis”, arXiv:1904.02101 [stat.CO]
👉 Check article proof 📄 👈
@nhejazi @terrytangyuan Requesting the reviewers to kindly consider the latest version of the pdf generated above (10.21105.joss.01509.pdf). We have added a section on “Comparison with other R Packages” and corrected some formatting issues that were there in the earlier version.
@sayanddude Thanks. The table looks great. Some feedback:
lintr
check). Similarly, in the roxygen
docs, the style is not consistent. I see both ##'
and #'
. The indentation levels and the roxygen
syntax are sometimes incorrect. Please double check. @terrytangyuan Thanks for the feedback! I will work on your comments and will get back to you with the updated paper and the required code changes as soon as possible.
@sayanddude — can you give us a status update? If you will need considerable more time, it would help if you let us know and we can add a "paused" label here.
@labarba - Hi! We are almost done addressing all the comments of the reviewer. We are now at the final stages of creating the unit test for the package. Please give us a couple of more days, we will update the code repository along with the updated paper by Tuesday (6th Aug, 2019) end of the day. Thanks!
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
@terrytangyuan We have worked on addressing all of your comments. Please find below our response to each action items:
• I see lots of long functions with >150 lines of code and the code style is not consistent (I suggest running a lintr check). Similarly, in the roxygen docs, the style is not consistent. I see both ##' and #'. The indentation levels and the roxygen syntax are sometimes incorrect. Please double check. Thanks for your comment! We have updated the Github repository and we have ensured the length of almost all the functions are below 150 lines of code. We have also run a lintr check and have corrected all the issues related to style inconsistencies, indentation issues and incorrect syntaxes.
• There isn't any unit test for the package. Thanks a lot for the comment! We have now implemented the unit test of the package (available at https://github.com/daya6489/SmartEDA).
• Have you tried running this package on large datasets?. Yes, the package works well on large datasets. Recently, we applied the SmartEDA package on the Microsoft malware prediction data in Kaggle (available at- https://www.kaggle.com/ajithvallabai/microsoft-malware-prediction/data ). This dataset was considerably large i.e. it had 8900000 rows and 82 columns. The SmartEDA package worked seamlessly on this dataset.
• Please make the paper more concise - I see many pages where each page only has one giant picture. Thanks for the comment! We have considerably reduced the size of the paper (please refer to the updated version of the paper i.e. 10.21105.joss.01509.pdf ) by removing a few verbose content and also by consolidating most of the images into a single figure i.e. Figure 2.
Thanks and Regards, Sayan
@labarba Hi! As discussed, we have completed working on the comments and have updated the Github code repository along with the paper.
The handling editor, @mgymrek, will take it from here.
@sayanddude thanks for making these changes
@nhejazi, @terrytangyuan can you now go over the revision? If your comments have been sufficiently addressed please finish filling out the checklist
Weighing in with a few points that I think ought to be addressed as we continue the review for the SmartEDA
package.
Major (cannot be checked off yet)
CODE_OF_CONDUCT.md
file, a "Contributions" section ought to be added to the package README.md
(and would translate to the pkgdown
site). It can be something simple that points to the existing CODE_OF_CONDUCT.md
document (e.g., from a package I wrote https://github.com/nhejazi/biotmle#contributions). Similarly, a statement in the README.md
about issues would be a welcome addition https://github.com/nhejazi/biotmle#issues.Minor (things that will not hold up the review):
ggplot
object. While Visualizations can be particularly difficult to test, there are also some tests that check only that a returned summary statistic matches the values produced by a call to the corresponding base function (e.g., a call to mean
). Over time, and as your package attracts more users, I would encourage you to write more tests.@nhejazi Thanks for the feedback! I will work on your comments and will soon get back to you with the updated paper and other required changes (hopefully, by end of next week!).
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
PDF failed to compile for issue #1509 with the following error:
Error producing PDF. ! Undefined control sequence. l.371 environment such as R (\proglang
Looks like we failed to compile the PDF
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
@nhejazi We have worked on addressing all of your comments. Please find below our response to each action items:
Major (cannot be checked off yet)
References: I believe some of the references are missing DOIs. As ........... Thanks for your comment! We have added the DOI for almost all the articles for which DOIs are available in the updated version of the paper. Only for the cited R packages and for the Exploratory Data Analysis book by Tukey (1977), there's no DOI available.
_Community Guidelines: While the package repository contains a CODE_OFCONDUCT.md file, a "Contributions" section .......................... Thanks for pointing it out! We have added an "Issues" section and a "Contributions" section in the README.md file of the package (please refer to https://github.com/daya6489/SmartEDA). We have also added a "Contribution Guidelines" as suggested by you.
Minor (things that will not hold up the review):
Unit tests: it looks like ............. Thanks for your valuable comment! Yes, we agree that more tests can be written for testing the summary statistics. And yes, as more users start using our package we will keep writing more unit tests. In fact, we plan to write some more unit test in the next release version of the package.
JOSS paper: From a quick read, there are some ................. Thanks for your comment! We have corrected the required markdown syntax (i.e. @eda:1). We have also cited the R core development team for "R" and have called R as a "programming environment" in text (pg. 1, paragraph 3, Introduction section) instead of "a statistical computing package" that was mentioned in the earlier version of the paper. We have also corrected some of the grammatical mistakes in the paper and have used tools such as "Grammarly" to run a check of the entire manuscript for any grammatical errors.
@mgymrek @terrytangyuan @nhejazi Hi! We have addressed all the comments mentioned above and have also made the required changes to the code repository. Kindly let us know if there's any pending action item from our end.
Thanks and Regards, Sayan
@sayanddude Thank you for quickly addressing our concerns, including the addition of the DOIs to all references (where possible). I've gone ahead and completed the reviewer checklist available to me and am ready to recommend the software paper for acceptance into JOSS. There may yet be other concerns to address but I do think the paper and R package are close.
@nhejazi Thank you so much!
Thanks!
@sayanddude see next steps below.
Some minor comments on the paper:
After fixing those typos, can you please make a zenodo archive, being sure the title and author list match those on the paper, and report the DOI here?
@sayanddude Thanks for addressing the comments. The paper looks good to me now and I recommend for publication.
@terrytangyuan Thank you so much!
@whedon generate pdf
Attempting PDF compilation. Reticulating splines etc...
Thanks!
@sayanddude see next steps below.
Some minor comments on the paper:
- "EDA can be categorized into Descriptive statistical techniques and graphical techniques": uncapitalize descriptive
- You can reference just Figure 2 rather than spelling out (a) through (f)
- There is a reference to a Figure 9 that I believe should be to Figure 3 instead
After fixing those typos, can you please make a zenodo archive, being sure the title and author list match those on the paper, and report the DOI here?
@mgymrek Thanks for the feedback! I have corrected all the required typos in the updated version of the pdf. Also, I have created the zenodo archive for the SmartEDA package (version 0.3.2) with DOI: 10.5281/zenodo.3383824 and the URL is: https://doi.org/10.5281/zenodo.3383824.
I have ensured that the title and the author names are same as the ones mentioned in the paper. Please let me know if I have missed out on anything. Thank you so much!
Submitting author: @sayanddude (Sayan Putatunda) Repository: https://github.com/daya6489/SmartEDA Version: 0.3.2 Editor: @mgymrek Reviewer: @nhejazi, @terrytangyuan Archive: 10.5281/zenodo.3383824
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@nhejazi & @terrytangyuan , please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @mgymrek know.
✨ Please try and complete your review in the next two weeks ✨
Review checklist for @nhejazi
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
paper.md
file include a list of authors with their affiliations?Review checklist for @terrytangyuan
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
paper.md
file include a list of authors with their affiliations?