Error with `unnest_tokens()` token = "ngrams" and non-standard column names #67

Closed zkamvar closed 7 years ago

zkamvar commented 7 years ago

unnest_tokens() will fail when token = "ngrams" if the incoming data frame has non-standard column names (e.g. spaces, commas). It originates from group_by_ (see: https://github.com/tidyverse/dplyr/issues/2891).

tribble(~a, ~`b, yeah`, ~b, 
  1, "a", "some sentence.", 
  2, "b", "another sentence with more words", 
  3, "c", "a sentence that has more things") %>% 
  unnest_tokens(output = word, input = b, token = "ngrams", n = 2)
#> Error in parse(text = x): <text>:1:2: unexpected ','
#> 1: b,
#>      ^
juliasilge commented 7 years ago

This is now fixed, after updating some of tidytext's internals to rlang/tidyeval.


tribble(~a, ~`b, yeah`, ~b, 
        1, "a", "some sentence.", 
        2, "b", "another sentence with more words", 
        3, "c", "a sentence that has more things") %>% 
  unnest_tokens(output = word, input = b, token = "ngrams", n = 2)

#> # A tibble: 10 x 3
#>        a `b, yeah`             word
#>    <dbl>     <chr>            <chr>
#>  1     1         a    some sentence
#>  2     2         b another sentence
#>  3     2         b    sentence with
#>  4     2         b        with more
#>  5     2         b       more words
#>  6     3         c       a sentence
#>  7     3         c    sentence that
#>  8     3         c         that has
#>  9     3         c         has more
#> 10     3         c      more things

Let me know if you run into any issues with using the tidyeval framework with tidytext! :smiley:

zkamvar commented 7 years ago

Thank you so much for the fix!

P.S. Love that commit message 😃

github-actions[bot] commented 2 years ago

