`get_sentences` no longer works on dataframe after commit `645401e`

collin-austad commented 5 years ago

Hi,

I love this package! I've been using it in regular sentiment analysis and had previously set up something like the example:

presidential_debates_2012 %>%
    get_sentences()

This used to produce a dataframe with all columns retained and the individual sentences that I could then run sentiment() on. This no longer works, and I'm struggling to find a way to get sentiment on a sentence level while preserving all of the remaining data in the original dataframe.

Here is a reprex for what I'm working through:

library(sentimentr)
library(magrittr)
library(dplyr)

x <- tibble(text = c('I like data a lot.', 
                     'sentimentr is great for sentiment analysis.  I use it a lot.'),
            rating = c(5, 4))
x

# Does not preserve `rating` colulmn and is not at sentence level:
x %>%
 mutate(sentence = get_sentences(text)) %$%
    sentiment_by(sentence)

# Preserves `rating` column, but does not allow for sentence-level sentiment:
x %>%
 mutate(sentence = get_sentences(text)) %$%
    sentiment_by(sentence, list(rating))

# Allows for sentence-level sentiment, but does not preserve `rating` column
x %>%
  mutate(sentence = get_sentences(text)) %$%
    sentiment(sentence)

I've also tried a combination of get_sentences() + tidyr::unnest() and then using bind_cols() on a separate dataset built using sentiment(), which works most of the time, but fails when, for whatever reason, sentiment() produces more rows than its supporting dataframe (produced by get_sentences() + tidyr::unnest())

trinker commented 4 years ago

There's a lot of information here..It sounds like the problem is that you can't extract the sentences form a sataframe like you used to be able to as shown below:

library(sentimentr)
library(tidyverse)

presidential_debates_2012 %>%
    sentimentr:::get_sentences()

## Error in get_sentences.data.frame(.) : object 'text.var' not found

Can you confirm this is the error you are getting?

trinker commented 4 years ago

I think this has been fixed now. Can you re-try?

trinker commented 4 years ago

If you want to keep the original text plus the new text here is one way to do it in the vein you were trying (not the most efficient):

library(sentimentr)
library(magrittr)
library(dplyr)

x <- tibble(text = c('I like data a lot.', 
                     'sentimentr is great for sentiment analysis.  I use it a lot.'),
            rating = c(5, 4))

## helper function to return sentence level reults (text and scores)

sentiment_sentences <- function(x){

    sents <- get_sentences(x)
    bind_cols(sentences = unlist(sents), sentiment(sents))

}

sentiment_sentences(x$text)

x %>%
    group_by(across()) %>%
    summarize(
        sentiment_sentences(text)
    )

Which yields:

  text                                                         rating    id sentences                                   element_id sentence_id word_count sentiment
  <chr>                                                         <dbl> <int> <chr>                                            <int>       <int>      <int>     <dbl>
1 I like data a lot.                                                5     1 I like data a lot.                                   1           1          5     0.224
2 sentimentr is great for sentiment analysis.  I use it a lot.      4     2 sentimentr is great for sentiment analysis.          1           1          6     0.204
3 sentimentr is great for sentiment analysis.  I use it a lot.      4     2 I use it a lot.                                      1           2          5     0

I'd probably approach this more of an aggregation and rejoin problem and am guessing it's way faster as it's not doing it rowwise:

x$id <- seq_len(nrow(x))
x %>%
    get_sentences() %>%
    sentiment() %>%
    left_join(x %>% select(original = text, id), by = 'id') %>%
    relocate(original, .before = text)

Which yields:

                                                       original                                        text rating id element_id sentence_id word_count sentiment
1:                                           I like data a lot.                          I like data a lot.      5  1          1           1          5 0.2236068
2: sentimentr is great for sentiment analysis.  I use it a lot. sentimentr is great for sentiment analysis.      4  2          2           1          6 0.2041241
3: sentimentr is great for sentiment analysis.  I use it a lot.

trinker commented 3 years ago

Closing as there is no response from the OP.

trinker / sentimentr

`get_sentences` no longer works on dataframe after commit `645401e` #112