trinker / sentimentr

Dictionary based sentiment analysis that considers valence shifters
Other
427 stars 84 forks source link

amend valence_shifters_dt #99

Closed fahadshery closed 5 years ago

fahadshery commented 5 years ago

Hi,

Looked at here. and executed:

update_key( valence_shifters_table, x = data.frame(x = c("Looks like"), y = c(3)), comparison = sentimentr::polarity_table )

this returns the error:

Error in is_key(key, sentiment = sentiment) : object 'valence_shifters_table' not found

I simply want to extend the valence_shifters_dt. but can't seem to do it. e.g. add terms like: lack of but can't.

Also I split sentences using udpipe which is very accurate for me. How to use this if you already have sentences vector along with doc_id, sentence_id,paragraph_id etc.?

fahadshery commented 5 years ago

Additionally, I have a data.frame which contains the following cols: doc_id, paragraph_id, sentence_id, sentence When I run: sentiment(text.var = unique_sentences$sentence, polarity_dt = my_sentiment_hash)

it returns with it's own col names such as element_id, sentence_id, word_count etc. I need to "join" the sentiment scores with the original data.frame. How do I do it?

fahadshery commented 5 years ago

finally, is there a way to stop sentimentr to splitting up the sentences when you already have done that using AI and other methods?

Shantanu1497 commented 5 years ago

I simply want to extend the valence_shifters_dt. but can't seem to do it. e.g. add terms like: lack of but can't.

Try this piece of code to extend the valence_shifters_dt.

key <- data.frame( x = hash_valence_shifters$x, y = hash_valence_shifters$y, stringsAsFactors = FALSE )

g <- cbind(c('lack of','example_negator'),c(3,1)) g <- as.data.frame(g) colnames(g) <- c('x','y') key <- rbind(key,g) key$y <- as.numeric(key$y) key <- as_key(key,comparison=hash_sentiment_jockers_rinker,sentiment = F)

Also, try to make sure that the last sentence runs correctly; since both the sentiment and valence shifter dictionaries need to be mutually exclusive.

Shantanu1497 commented 5 years ago

Additionally, I have a data.frame which contains the following cols: doc_id, paragraph_id, sentence_id, sentence When I run: sentiment(text.var = unique_sentences$sentence, polarity_dt = my_sentiment_hash)

it returns with it's own col names such as element_id, sentence_id, word_count etc. I need to "join" the sentiment scores with the original data.frame. How do I do it?

@fahadshery , could you share a snippet of what you're referring to? From my understanding, you have a textual dataset and you wish to "join" it back from the output you get from the sentiment function.

One way is using dplyr joins: Say if you have a dataset that looks like this (without the sentiment column, of course) ->

screenshot 2018-12-22 at 8 32 27 pm

The sentiment scores would be calculated for each sentence break separately, if they exist in a single row. So, if your text in row 1 is "I am happy. I am sad." - the element_id remains the same but you might want to average over the sentiment scores you get and group by element_id.

Check out this piece of code to help you out.

your_text <- crowdflower_weather$text[1:10] element_id <- seq_along(your_text) your_df <- data.frame(element_id,your_text) your_df$your_text <- as.character(your_df$your_text)

sentiment_df <- as.data.frame(sentiment(your_df$your_text) %>% group_by(element_id) %>% summarise(sentiment=mean(sentiment)))

sentiment_df <- inner_join(sentiment_df,your_df,by='element_id')

This is what you get as your final output.

screenshot 2018-12-22 at 8 41 10 pm

Your result may vary based on the type of aggregation you use for the sentiment column derived from the output. Also, if your dataset contains a lot of text: try running it through get_sentences() first.

Hoping this helps! :)

trinker commented 5 years ago

The left join is the best bet if aggregations were done inside of sentiment_by but if you just want to join back up to the original data.frame then I'd suggest a more efficient method. Use sentiment_by which will perform an aggregation at the element level (each string you passed in gets an element_id that retains the original order of the strings. Because that order is perserved you can just to a dplyr::bind_cols

library(sentimentr); library(dplyr)

your_df <- dplyr::data_frame(your_text = crowdflower_weather$text[1:10])

sentiment_df <- sentiment_by(your_df$your_text)
sentiment_df <- bind_cols(sentiment_df, your_df)
fahadshery commented 5 years ago

sentiment_df <- bind_cols(sentiment_df, your_df)

@trinker this is exactly what I am currently doing with sentiment_by method. But I do think there must be an option where we could set not to split into sentences (if already done it prior to calling sentiment()). The reason I say this is because I have already splitted the text into sentences and have my own element_id. When I call sentiment(), some times it further splits the df and hence returns a different element_id which can't be joined/bind_cols back to the original df

key <- as_key(key,comparison=hash_sentiment_jockers_rinker,sentiment = F)

thanks @Shantanu1497 I will give it a try (its Xmas eve!!!)

The final issue I have is as to how to deal with shorter pieces of text. E.g. if I have this text: no broadband, no phone line. It correctly gives a negative sentiment score. but I have the same text without the comma i.e. no broadband no phone it gives positive score??? why is that and how to deal with it?

trinker commented 5 years ago

There is a way but you'll have to add the class to the data.frame or list that contains the split sentence manually. This is difficult to do by design so the user is taking responsibility for the decision to not use sentimentr tooling.

fahadshery commented 5 years ago

is this the solution to to deal with comma/no comma issues I mentioned above? Because this is currently holding me back to bring this into my work flow. if yes, then do you have a code snippet? @trinker sorry for being a pain

trinker commented 5 years ago

No that was in response to your issue with using sentnces that were already split. I will not give a snippet , at least not here, because I don't want to make it easy to do this, though I can see the need for it.

fahadshery commented 5 years ago

@Shantanu1497 I executed the following to bring in my own dictionary and valence_shifters_dt in case someone else is looking for the same:

updated_sentiment_hash <- read_csv("~/PATH/TO/local_sentiment.csv")

local_sentiment_hash <- (as_key(updated_sentiment_hash))

valence_shifters_dt_local <- read_csv("~/PATH/TO/local_valence_shifters_dt.csv")

valence_shifters_dt_local$y <- as.numeric(valence_shifters_dt_local$y)

valence_shifters_dt_local <- as_key(valence_shifters_dt_local,comparison=local_sentiment_hash,sentiment = F)

sentiment_by("nothing works", polarity_dt = local_sentiment_hash,valence_shifters_dt = valence_shifters_dt_local)