Closed fahadshery closed 5 years ago
Additionally, I have a data.frame
which contains the following cols
:
doc_id, paragraph_id, sentence_id, sentence
When I run:
sentiment(text.var = unique_sentences$sentence, polarity_dt = my_sentiment_hash)
it returns with it's own col
names such as element_id, sentence_id, word_count
etc.
I need to "join"
the sentiment scores with the original data.frame
. How do I do it?
finally, is there a way to stop sentimentr to splitting up the sentences when you already have done that using AI and other methods?
I simply want to extend the
valence_shifters_dt
. but can't seem to do it. e.g. add terms like:lack of
but can't.
Try this piece of code to extend the valence_shifters_dt
.
key <- data.frame( x = hash_valence_shifters$x, y = hash_valence_shifters$y, stringsAsFactors = FALSE )
g <- cbind(c('lack of','example_negator'),c(3,1))
g <- as.data.frame(g)
colnames(g) <- c('x','y')
key <- rbind(key,g)
key$y <- as.numeric(key$y)
key <- as_key(key,comparison=hash_sentiment_jockers_rinker,sentiment = F)
Also, try to make sure that the last sentence runs correctly; since both the sentiment and valence shifter dictionaries need to be mutually exclusive.
Additionally, I have a
data.frame
which contains the followingcols
:doc_id, paragraph_id, sentence_id, sentence
When I run:sentiment(text.var = unique_sentences$sentence, polarity_dt = my_sentiment_hash)
it returns with it's own
col
names such aselement_id, sentence_id, word_count
etc. I need to"join"
the sentiment scores with the originaldata.frame
. How do I do it?
@fahadshery , could you share a snippet of what you're referring to? From my understanding, you have a textual dataset and you wish to "join" it back from the output you get from the sentiment
function.
One way is using dplyr joins: Say if you have a dataset that looks like this (without the sentiment column, of course) ->
The sentiment scores would be calculated for each sentence break separately, if they exist in a single row. So, if your text in row 1 is "I am happy. I am sad." - the element_id
remains the same but you might want to average over the sentiment scores you get and group by element_id
.
Check out this piece of code to help you out.
your_text <- crowdflower_weather$text[1:10]
element_id <- seq_along(your_text)
your_df <- data.frame(element_id,your_text)
your_df$your_text <- as.character(your_df$your_text)
sentiment_df <- as.data.frame(sentiment(your_df$your_text) %>% group_by(element_id) %>% summarise(sentiment=mean(sentiment)))
sentiment_df <- inner_join(sentiment_df,your_df,by='element_id')
This is what you get as your final output.
Your result may vary based on the type of aggregation you use for the sentiment
column derived from the output. Also, if your dataset contains a lot of text: try running it through get_sentences()
first.
Hoping this helps! :)
The left join is the best bet if aggregations were done inside of sentiment_by
but if you just want to join back up to the original data.frame then I'd suggest a more efficient method. Use sentiment_by
which will perform an aggregation at the element level (each string you passed in gets an element_id
that retains the original order of the strings. Because that order is perserved you can just to a dplyr::bind_cols
library(sentimentr); library(dplyr)
your_df <- dplyr::data_frame(your_text = crowdflower_weather$text[1:10])
sentiment_df <- sentiment_by(your_df$your_text)
sentiment_df <- bind_cols(sentiment_df, your_df)
sentiment_df <- bind_cols(sentiment_df, your_df)
@trinker this is exactly what I am currently doing with sentiment_by
method. But I do think there must be an option where we could set not to split into sentences (if already done it prior
to calling sentiment()
). The reason I say this is because I have already splitted the text into sentences and have my own element_id
. When I call sentiment()
, some times it further splits the df
and hence returns a different element_id
which can't be joined/bind_cols
back to the original df
key <- as_key(key,comparison=hash_sentiment_jockers_rinker,sentiment = F)
thanks @Shantanu1497 I will give it a try (its Xmas eve!!!)
The final issue I have is as to how to deal with shorter pieces of text. E.g. if I have this text: no broadband, no phone line. It correctly gives a negative sentiment score. but I have the same text without the comma i.e. no broadband no phone it gives positive score??? why is that and how to deal with it?
There is a way but you'll have to add the class to the data.frame or list that contains the split sentence manually. This is difficult to do by design so the user is taking responsibility for the decision to not use sentimentr tooling.
is this the solution to to deal with comma/no comma issues I mentioned above? Because this is currently holding me back to bring this into my work flow. if yes, then do you have a code snippet? @trinker sorry for being a pain
No that was in response to your issue with using sentnces that were already split. I will not give a snippet , at least not here, because I don't want to make it easy to do this, though I can see the need for it.
@Shantanu1497 I executed the following to bring in my own dictionary
and valence_shifters_dt
in case someone else is looking for the same:
updated_sentiment_hash <- read_csv("~/PATH/TO/local_sentiment.csv")
local_sentiment_hash <- (as_key(updated_sentiment_hash))
valence_shifters_dt_local <- read_csv("~/PATH/TO/local_valence_shifters_dt.csv")
valence_shifters_dt_local$y <- as.numeric(valence_shifters_dt_local$y)
valence_shifters_dt_local <- as_key(valence_shifters_dt_local,comparison=local_sentiment_hash,sentiment = F)
sentiment_by("nothing works", polarity_dt = local_sentiment_hash,valence_shifters_dt = valence_shifters_dt_local)
Hi,
Looked at here. and executed:
update_key( valence_shifters_table, x = data.frame(x = c("Looks like"), y = c(3)), comparison = sentimentr::polarity_table )
this returns the error:
Error in is_key(key, sentiment = sentiment) : object 'valence_shifters_table' not found
I simply want to extend the
valence_shifters_dt
. but can't seem to do it. e.g. add terms like:lack of
but can't.Also I split sentences using udpipe which is very accurate for me. How to use this if you already have sentences vector along with
doc_id, sentence_id,paragraph_id
etc.?