openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
721 stars 38 forks source link

[REVIEW]: Sentiment Analysis of Twitter Data (SAoTD) #764

Closed whedon closed 5 years ago

whedon commented 6 years ago

Submitting author: @evan-l-munson (Evan Munson) Repository: https://github.com/evan-l-munson/SAoTD Version: 0.2.0 Editor: @arfon Reviewer: @kbenoit Archive: 10.5281/zenodo.2578973

Status

status

Status badge code:

HTML: <a href="http://joss.theoj.org/papers/e6002792b44f50039afc22dbe3d4a086"><img src="http://joss.theoj.org/papers/e6002792b44f50039afc22dbe3d4a086/status.svg"></a>
Markdown: [![status](http://joss.theoj.org/papers/e6002792b44f50039afc22dbe3d4a086/status.svg)](http://joss.theoj.org/papers/e6002792b44f50039afc22dbe3d4a086)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@kbenoit, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.theoj.org/about#reviewer_guidelines. Any questions/concerns please let @leeper know.

Review checklist for @kbenoit

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 6 years ago

Hello human, I'm @whedon. I'm here to help you with some common editorial tasks. @kbenoit it looks like you're currently assigned as the reviewer for this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands
whedon commented 6 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 6 years ago

--> Check article proof :page_facing_up: <--

kbenoit commented 6 years ago

Review

[Checklist moved above]

Comments

On the paper

This paper describes an R package that provides a workflow for analyzing sentiment and topics in Twitter text, wrapping around packages such as twitteR, tidytext, and topicmodels. The package contains a number of useful analytic functions for looking at Twitter data, and these are clearly demonstrated in the vignette. Some of these could be useful to other forms of text, but the package is specially designed to work with Twitter data, including not just the import of this data but also working with Twitter-specific handles such as hashtags and usernames

This paper makes a nice, short article that should be published, but could be improved by addressing a few relatively minor issues:

On the package

The package works, and I have seen much worse source code in widely-used R packages published on CRAN. These are some suggestions for improving the code and the package, not necessarily linked to the paper and whether it should be published. (I leave to the editor to decide.)

Naming. This is a matter of preference, although there are some emerging guidelines designed to reduce the chaos in the R world. This paper combines capitalized object names with lower-cased object names, and function names with . (e.g. e.g. PosNeg.Words()), which is generally discouraged due to the indeterminacy with the S3 dispatch system. Do the functions really need to be capitalized? Also naming is not consistent: Word.Corr() is "object.verb" but Number.Topics is verb.object.

The package name itself runs contrary to this advice from Hadley Wickham:

Avoid using both upper and lower case letters: doing so makes the package name hard to type and even harder to remember.

Unnecessary C++ code. Why is there a function rcpp_hello_world()? This looks like demonstration code that should be removed.

Data copyright issues. Can distribute the data in raw_tweets. That material is copyright of the authors of the Tweets. At the least, it may require some attribution. It would be worth reviewing the Twitter terms of service about this.

Data object loading. Set LazyLoad: true in DESCRIPTION else code like the (not run) examples for Scores() will not work.

(non-)Object orientation. None of the functions use generics and method dispatch, but rather check the class of the input objects using conditionals within each function (e.g. here. This makes extending the package harder, in addition to being more error-prone. The function names are very generic, furthermore, such as BoxPlot(), or Tidy(). Other packages have functions named tidy(), but they are defined for specific object classes. I suggest using more distinctive names to differentiate this package's functions from those found in other packages, and/or method dispatch for specific object classes. Simplyu capitalizing the function names is likely to confuse some users.

Code organization. Nearly all functions are in a single long .R file called Function.R. Splitting this into smaller files would make the code organization clearer.

Examples. Most examples are not run, due to the difficulties of connecting to the Twitter API using authentication. But this is not true for an functions that only use raw_tweets, such as Bigram(), Bigram.Network(), BoxPlot(), etc. Furthermore, this code does not run as written, because raw_tweets is not lazy loaded. In addition, there is no need to load the package in the examples (using library(SAoTD)) because the help functions should be only accessible if the package is already loaded.

Tests. The file tests/testthat/test_Acquire.R contains Twitter authentication keys. These should be removed (and changed, since they will remain visible in the git history).

leeper commented 6 years ago

Excellent review, @kbenoit! Thank so much!

@evan-l-munson Can you address the issues raised in the review - particularly the missing checked items from the review checklist and the other useful suggestions raised in the review?

evan-l-munson commented 6 years ago

Thank you for the review.

I will work on those notes/corrections as soon as I get a chance (just moved my family across the United States and started a new job).

Evan Munson

Sent from my iPhone

On Jun 7, 2018, at 01:15, Thomas J. Leeper notifications@github.com wrote:

Excellent review, @kbenoit! Thank so much!

@evan-l-munson Can you address the issues raised in the review - particularly the missing checked items from the review checklist and the other useful suggestions raised in the review?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

leeper commented 6 years ago

@evan-l-munson Just a nudge on this.

leeper commented 6 years ago

@evan-l-munson Just another nudge on this.

evan-l-munson commented 6 years ago

Thank you for the reminder. I have not forgotten.

Sent from my iPhone

On Aug 26, 2018, at 16:05, Thomas J. Leeper notifications@github.com wrote:

@evan-l-munson Just another nudge on this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

arfon commented 6 years ago

Hi @evan-l-munson - please try and get to these updates when you get a chance.

evan-l-munson commented 6 years ago

I have made about half the corrections suggested above. In the next week, I hope to rename my functions to better fit with standard naming conventions, additionally, I am looking at the test dataset for the copyright issues and a couple of the other issues.

evan-l-munson commented 5 years ago

Good evening @arfon, I think I made all the requested corrections. Please let me know if you need anything else corrected/adjusted. Thanks for the patience and assistance in this process.

arfon commented 5 years ago

:wave: @kbenoit - please come and take another look at this submission when you get a chance, the author has made some updates based on your feedback.

arfon commented 5 years ago

@whedon assign @arfon as editor

kbenoit commented 5 years ago

@arfon Happy to do so.

arfon commented 5 years ago

Hi @kbenoit - have you had a chance to take another look at this submission?

kbenoit commented 5 years ago

I've had a chance to look at the package again, and I am pleased to report that it and the paper are much improved. @evan-l-munson has done a very good job of addressing my concerns above, if not as good a job of summarizing in a PR or memo what these changes were 😉 .

Package. The code organization is much better (and will be easier to maintain or for potential contributors to absorb). I also like the function naming much better than before - the function index looks more tidy and sensible now.

There is still unnecessary C++ code in /src, which creates the function rcpp_hello_world() in the package index. This should simply be removed.

Paper. The paper does a much better job now of explaining the package and its purpose, and the vignette rounds this out nicely.

Subject to the Hello World change, :+1:.

arfon commented 5 years ago

Thanks @kbenoit. @evan-l-munson - please make these final changes to your package and we can move forward accepting this submission.

evan-l-munson commented 5 years ago

Gentlemen, I missed your email by accident. I appreciate these comments and will try to get them corrected/accounted for this weekend or sometime during the holidays. Thank you!

evan-l-munson commented 5 years ago

Gentlemen, good morning. I have corrected rcpp_hello_world() function. I was running through everything to make sure it was working properly before I gave you the final word. Everything was looking good until I tried to view the vignette. For some reason, the vignette is not found when using utils::vignette("saotd"). I will troubleshoot that and hopefully get that working. Thanks!

arfon commented 5 years ago

Thanks for the update @evan-l-munson

labarba commented 5 years ago

👋 @evan-l-munson — How is it going? Have you been able to troubleshoot the issue? Give us an update when you can. Thanks!

evan-l-munson commented 5 years ago

Good morning, I appreciate your patience with me. Finishing up this package is ending up being more challenging and time-consuming than I anticipated. I was working on my vignette issue yesterday and am struggling to fix what I am seeing. Everything seems to be built properly, however, after I re-download and load the package from my Git page the vignettes are not found. I have used both, utils::vignette('saotd') and utils::browseVignettes('saotd') but receive the same error message: No vignettes found by utils::browseVignettes("saotd"). I have also sent the Git repository to a friend, who experienced the same issue. As I look at the R package structure I think I have everything correct (and I have been looking at it for months, trying to correct) but obviously I have something incorrect since they are not being found. If you have some insight as I what might be happening I would appreciate the any thoughts you might have so I can complete this. Thanks!

arfon commented 5 years ago

If you have some insight as I what might be happening I would appreciate the any thoughts you might have so I can complete this. Thanks!

I'm not sure sorry. Perhaps @kbenoit has some thoughts on this?

evan-l-munson commented 5 years ago

@arfon I have another colleague looking into the issue for me. I am hoping they will get back to me this week. I have ran out of ideas on my end and am not sure why I can view the vignettes after I re-download the package from GitHub. If the vignette issue isn't a big one I would say the package is ready for submission, but if that is a big issue, I will continue to work on it. Thanks!

kbenoit commented 5 years ago

@evan-l-munson I just tried building your package and it fails with

> devtools::build_vignettes()
Building saotd vignettes
... 
Error : processing vignette 'SAoTD.Rmd' failed with diagnostics:
Failed to locate the ‘weave’ output file (by engine ‘knitr::rmarkdown’) for vignette with name ‘SAoTD’. The following files exist in directory ‘.’: ‘saotd.html’, ‘saotd.R’, ‘SAoTD.Rmd’
Error: processing vignette 'SAoTD.Rmd' failed with diagnostics:
Failed to locate the ‘weave’ output file (by engine ‘knitr::rmarkdown’) for vignette with name ‘SAoTD’. The following files exist in directory ‘.’: ‘saotd.html’, ‘saotd.R’, ‘SAoTD.Rmd’

but when I changed the vignette name to test.Rmd (all lowercase) it succeeded. I suggest that you change the vignette filename to lowercase (the title can still be anything you want) and hopefully the problem will be solved. It seems related to https://stackoverflow.com/questions/27338970/vignette-creation-on-package-build-fails-with-the-error-failed-to-locate-the-w.

Also I don't think you need Rcpp at all. Suggest you delete /src, R/RcppExports.R, remove Rcpp from the Imports section of DESCRIPTION, remove the line LinkingTo: Rcpp, and delete the imports in R/SAoTD.R. They are doing nothing for you - I suspect they came from a package starter boilerplate tool.

evan-l-munson commented 5 years ago

@kbenoit I think I have renamed the vignette but still need to run through some checks. I'm not sure if the SAoTD.Rmd is a holdover from when I changed the name of the package to saotd.

I haven't had a chance to work on removing the rcpp items but hope to get to that later this weekend.

Again thanks for the help!

evan-l-munson commented 5 years ago

@kbenoit I fixed the vignette issue finally, which of course took longer than I expected. I went through and removed all Rcpp items in the package and ended up breaking everything. This weekend, I added all the Rcpp items back into the package, and now the package is passing all tests again and working properly.

kbenoit commented 5 years ago

There is no C++ in the package, so the "breaking" parts were probably due to incomplete removal or failure to rebuild the parts fully when you roxygenized.

kbenoit commented 5 years ago

Actually I just fixed it for you, https://github.com/evan-l-munson/saotd/pull/8.

evan-l-munson commented 5 years ago

@kbenoit Again thank you for the help with the rcpp items. I have merged your correction into the master repository. I retested everything and your corrections have passed all checks. I think this should be good to go. Thank you again for your help!

evan-l-munson commented 5 years ago

@arfon with a couple critical pointers from @kbenoit I think that I have finally corrected the last couple of items of concern he had for my package. Is there anything else that you need from me with this package submission? Thanks!

arfon commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

arfon commented 5 years ago

Hi @evan-l-munson. I've made some slight tweaks to your paper in https://github.com/evan-l-munson/saotd/pull/9 - let me know what you think.

In addition, please make sure you:

evan-l-munson commented 5 years ago

@arfon, Good evening. I have completed your suggested edits/additions and pushed to GitHub. If there is anything else, let me know and I will get working on it asap. Thanks for the help!

arfon commented 5 years ago

@whedon generate pdf

whedon commented 5 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 5 years ago

:point_right: Check article proof :page_facing_up: :point_left:

arfon commented 5 years ago

@arfon, Good evening. I have completed your suggested edits/additions and pushed to GitHub. If there is anything else, let me know and I will get working on it asap. Thanks for the help!

@evan-l-munson thanks. Please could you clarify what version the package is now at? Given the modifications during the review it would seem appropriate to make a new release and archive in Zenodo.

Once you've done this I'm happy to proceed with accepting this submission.

evan-l-munson commented 5 years ago

@arfon, I bumped the package to version 0.2.0. I will work to get resubmit to Zendo within the next day or two. Will let you know when I get that done. Thanks!

evan-l-munson commented 5 years ago

@arfon, I released the 0.2.0 version to zenodo this morning and have updated the DOI badge. Is there anything else you need? Thanks!

arfon commented 5 years ago

@whedon set 10.5281/zenodo.2578973 as archive

whedon commented 5 years ago

OK. 10.5281/zenodo.2578973 is the archive.

arfon commented 5 years ago

@whedon set 0.2.0 as version

whedon commented 5 years ago

OK. 0.2.0 is the version.

arfon commented 5 years ago

@whedon accept

whedon commented 5 years ago
Attempting dry run of processing paper acceptance...
whedon commented 5 years ago

OK DOIs

- http://doi.org/10.18637/jss.v040.i13 is OK

MISSING DOIs

- None

INVALID DOIs

- None