Closed mjhendrickson closed 4 years ago
First up, evaluate syuzhet
per the walkthrough by Michael Kearney. https://mkearney.github.io/blog/2017/06/01/intro-to-rtweet/
Main takeaway - there are some useful elements, however the walkthrough no longer holds up as the tokenize
argument no longer works. A suitable alternative to continue with the walkthrough was not found.
Modified the example to use #rstats
related tweets.
Learned:
plain_tweets()
function, which will be useful in extracting text from the specified text fields.tokenize
argument within plain_tweets()
does not work.
tokenizers::tokenize_tweets()
may work.tidytext as outlined in Text Mining with R by Julia Silge and David Robinson. https://www.tidytextmining.com/, specifically the section on Twitter at https://www.tidytextmining.com/twitter.html
Main takeaway - this is a rich resource that is highly usable and will be a wonderful reference moving forward in this project. However, this Twitter analysis does not cover sentiment. The rest of the text must be explored.
Learned:
unnest_tokens()
to do just that - tokenize, remove stopwords, and unnest the tokens. This utilizes the tokenizers
package.nest()
to nest dataframes within dataframes.map()
to run functions across elements within a nested dataframe.Next: evaluate SentimentAnalysis
https://cran.r-project.org/web/packages/SentimentAnalysis/vignettes/SentimentAnalysis.html
https://github.com/sfeuerriegel/SentimentAnalysis
Main takeaway - overall this is a strong package and has many great features for sentiment analysis. However, there are some possible limitations due to the construction and content of tweets.
It is possible to create your own dictionary, which could be useful to service tweet lexicon. While interesting, that may fall outside of the scope of this analysis.
Learned:
lasso regularization
option to extract significant text based on a response driver.Next: evaluate sentimentr
https://cran.r-project.org/web/packages/sentimentr/readme/README.html
Main takeaway - this looks like a fantastic package with many great ways to visualize sentiment. I'm curious if these methods are available outside of the package. Unclear if the package has adequate support for Twitter without creating a custom dictionary. This may be more complicated given I do not have standard valence for Twitter words, though sentiment valence may be available in other packages. This route seems more complicated than it is worth if other packages already have good lexicon values.
Learned:
qdap
, syuzhet
- focusing on a balance of accuracy vs speed. Attempts to take valence shifters into account (negations, amplifiers, de-amplifiers, adversative conjunctions).sentiment()
b. sentiment_by()
Comparing sentimentr, syuzhet, meanr, and Stanford
.Next: evaluate saotd
https://cran.r-project.org/web/packages/saotd/vignettes/saotd.html
Main takeaway - this is an excellent package geared toward Twitter data - which also draws some elements from TidyText
to shape the datagram. There is high utility here with data manipulation, analysis, and visualization.
Learned:
rtweet
directly to pull data. Can use the saotd
function tweet_acquire
. Seems possible to still use rtweet
if preferred to gather more information, but may need more cleaning.tidytext
, which was evaluated above. Using tweet_tidy
, tidytext
creates tokens from each tweet, creating a new row for every word in the tweet, appending the word to the end of each record. This takes the single tweet record and creates copies of all fields, appending each word in sequence to each subsequent row. "The cleaning process removes: “@”, “#” and “RT” symbols, Weblinks, Punctuation, Emojis, and Stop Words like (“the”, “of”, etc.)."unigrams
, bigrams
, trigrams
- iterations of the n-gram
, which is the continuous sequence of n
items from the given text (here, Tweets).n-grams
to see if words should be combined to single entities (such as with merge_terms
) or mispellings.saotd
shows bigram_network
and word_corr_network
to show the network and correlations.posneg_words
to get positive and negative sentiment by word. Words can easily be filtered out if they drag up/down the sentiment.After reviewing each of the packages (syuzhet
, tidytext
, SentimentAnalysis
, sentimentr
, saotd
), I will begin this analysis with saotd
as it was created specifically for Twitter data and it utilizes some elements from TidyText
. As I continue the analysis, I may branch out into the TidyText
package or other packages as suitable.
Reconsider package given limitations of saotd
.
https://github.com/mjhendrickson/rtweet-sentiment-analysis/issues/4
Determine the best library, or libraries, to use for sentiment analysis.
syuzhet
is utilized in the walkthrough by Michael Kearney (rtweet
creator) https://mkearney.github.io/blog/2017/06/01/intro-to-rtweet/tidytext
as outlined in Text Mining with R by Julia Silge and David Robinson. https://www.tidytextmining.com/SentimentAnalysis
https://cran.r-project.org/web/packages/SentimentAnalysis/vignettes/SentimentAnalysis.htmlsentimentr
https://cran.r-project.org/web/packages/sentimentr/readme/README.htmlsaotd
https://cran.r-project.org/web/packages/saotd/vignettes/saotd.html