ropensci / rtweet

🐦 R client for interacting with Twitter's [stream and REST] APIs
https://docs.ropensci.org/rtweet
Other
785 stars 201 forks source link

Error using ts_plot() with grouped data frame #724

Closed bensoltoff closed 2 years ago

bensoltoff commented 2 years ago

Problem

ts_plot() should be able to generate a line graph color coded by different group values. After faf2e981198aaf8097dd092301a98e2805e215a1, this no longer works. It instead generates an error from ggplot() because it is not properly extracting the column name of the grouping variable.

The change from aes_string() to aes() string with the .data pronoun produces invalid code. names(.data)[3] does not actually retrieve the desired column.

Expected behavior

Should draw a color-coded line graph.

Reproduce the problem

library(tidyverse)
library(rtweet)
#> 
#> Attaching package: 'rtweet'
#> The following object is masked from 'package:purrr':
#> 
#>     flatten

# get data on rstats and python
rt <- search_tweets("rstats", n = 100)
python <- search_tweets("python", n = 100)

# combine in grouped data frame
rt_py <- bind_rows(
  rt = rt,
  py = python,
  .id = "lang"
) %>%
  group_by(lang)

# attempt to plot with ts_plot()
ts_plot(data = rt_py)
#> Error: Must request at least one colour from a hue palette.

# manual application with error
ts_data(data = rt_py) %>%
  ggplot(mapping = aes(x = .data[["time"]], y = .data[["n"]], color = names(.data)[3])) +
  geom_line()
#> Error: Must request at least one colour from a hue palette.

# manually specify grouping column
ts_data(data = rt_py) %>%
  ggplot(mapping = aes(x = .data[["time"]], y = .data[["n"]], color = lang)) +
  geom_line()

Created on 2022-08-10 by the reprex package (v2.0.1.9000)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.2.1 (2022-06-23) #> os macOS Monterey 12.3 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/New_York #> date 2022-08-10 #> pandoc 2.18 @ /Applications/RStudio.app/Contents/MacOS/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> askpass 1.1 2019-01-13 [2] CRAN (R 4.2.0) #> assertthat 0.2.1 2019-03-21 [2] CRAN (R 4.2.0) #> backports 1.4.1 2021-12-13 [2] CRAN (R 4.2.0) #> bit 4.0.4 2020-08-04 [2] CRAN (R 4.2.0) #> bit64 4.0.5 2020-08-30 [2] CRAN (R 4.2.0) #> broom 1.0.0 2022-07-01 [2] CRAN (R 4.2.0) #> cellranger 1.1.0 2016-07-27 [2] CRAN (R 4.2.0) #> cli 3.3.0 2022-04-25 [2] CRAN (R 4.2.0) #> colorspace 2.0-3 2022-02-21 [2] CRAN (R 4.2.0) #> crayon 1.5.1 2022-03-26 [2] CRAN (R 4.2.0) #> curl 4.3.2 2021-06-23 [2] CRAN (R 4.2.0) #> DBI 1.1.3 2022-06-18 [2] CRAN (R 4.2.0) #> dbplyr 2.2.1 2022-06-27 [2] CRAN (R 4.2.0) #> digest 0.6.29 2021-12-01 [2] CRAN (R 4.2.0) #> dplyr * 1.0.9 2022-04-28 [2] CRAN (R 4.2.0) #> ellipsis 0.3.2 2021-04-29 [2] CRAN (R 4.2.0) #> evaluate 0.16 2022-08-09 [1] CRAN (R 4.2.1) #> fansi 1.0.3 2022-03-24 [2] CRAN (R 4.2.0) #> farver 2.1.1 2022-07-06 [2] CRAN (R 4.2.0) #> fastmap 1.1.0 2021-01-25 [2] CRAN (R 4.2.0) #> forcats * 0.5.1 2021-01-27 [2] CRAN (R 4.2.0) #> fs 1.5.2 2021-12-08 [2] CRAN (R 4.2.0) #> gargle 1.2.0 2021-07-02 [2] CRAN (R 4.2.0) #> generics 0.1.3 2022-07-05 [2] CRAN (R 4.2.0) #> ggplot2 * 3.3.6 2022-05-03 [2] CRAN (R 4.2.0) #> glue 1.6.2 2022-02-24 [2] CRAN (R 4.2.0) #> googledrive 2.0.0 2021-07-08 [2] CRAN (R 4.2.0) #> googlesheets4 1.0.0 2021-07-21 [2] CRAN (R 4.2.0) #> gtable 0.3.0 2019-03-25 [2] CRAN (R 4.2.0) #> haven 2.5.0 2022-04-15 [2] CRAN (R 4.2.0) #> highr 0.9 2021-04-16 [2] CRAN (R 4.2.0) #> hms 1.1.1 2021-09-26 [2] CRAN (R 4.2.0) #> htmltools 0.5.3 2022-07-18 [2] CRAN (R 4.2.0) #> httr 1.4.3 2022-05-04 [2] CRAN (R 4.2.0) #> jsonlite 1.8.0 2022-02-22 [2] CRAN (R 4.2.0) #> knitr 1.39 2022-04-26 [2] CRAN (R 4.2.0) #> labeling 0.4.2 2020-10-20 [2] CRAN (R 4.2.0) #> lifecycle 1.0.1 2021-09-24 [2] CRAN (R 4.2.0) #> lubridate 1.8.0 2021-10-07 [2] CRAN (R 4.2.0) #> magrittr 2.0.3 2022-03-30 [2] CRAN (R 4.2.0) #> mime 0.12 2021-09-28 [2] CRAN (R 4.2.0) #> modelr 0.1.8 2020-05-19 [2] CRAN (R 4.2.0) #> munsell 0.5.0 2018-06-12 [2] CRAN (R 4.2.0) #> openssl 2.0.2 2022-05-24 [2] CRAN (R 4.2.0) #> pillar 1.8.0 2022-07-18 [2] CRAN (R 4.2.0) #> pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.2.0) #> prettyunits 1.1.1 2020-01-24 [2] CRAN (R 4.2.0) #> progress 1.2.2 2019-05-16 [2] CRAN (R 4.2.0) #> purrr * 0.3.4 2020-04-17 [2] CRAN (R 4.2.0) #> R.cache 0.16.0 2022-07-21 [2] CRAN (R 4.2.0) #> R.methodsS3 1.8.2 2022-06-13 [2] CRAN (R 4.2.0) #> R.oo 1.25.0 2022-06-12 [2] CRAN (R 4.2.0) #> R.utils 2.12.0 2022-06-28 [2] CRAN (R 4.2.0) #> R6 2.5.1 2021-08-19 [2] CRAN (R 4.2.0) #> readr * 2.1.2 2022-01-30 [2] CRAN (R 4.2.0) #> readxl 1.4.0 2022-03-28 [2] CRAN (R 4.2.0) #> reprex 2.0.1.9000 2022-08-10 [1] Github (tidyverse/reprex@6d3ad07) #> rlang 1.0.4 2022-07-12 [2] CRAN (R 4.2.0) #> rmarkdown 2.14 2022-04-25 [2] CRAN (R 4.2.0) #> rstudioapi 0.13 2020-11-12 [2] CRAN (R 4.2.0) #> rtweet * 1.0.2 2022-07-21 [1] CRAN (R 4.2.0) #> rvest 1.0.2 2021-10-16 [2] CRAN (R 4.2.0) #> scales 1.2.0 2022-04-13 [2] CRAN (R 4.2.0) #> sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.2.0) #> stringi 1.7.8 2022-07-11 [2] CRAN (R 4.2.0) #> stringr * 1.4.0 2019-02-10 [2] CRAN (R 4.2.0) #> styler 1.7.0 2022-03-13 [2] CRAN (R 4.2.0) #> tibble * 3.1.8 2022-07-22 [2] CRAN (R 4.2.0) #> tidyr * 1.2.0 2022-02-01 [2] CRAN (R 4.2.0) #> tidyselect 1.1.2 2022-02-21 [2] CRAN (R 4.2.0) #> tidyverse * 1.3.2 2022-07-18 [2] CRAN (R 4.2.0) #> tzdb 0.3.0 2022-03-28 [2] CRAN (R 4.2.0) #> utf8 1.2.2 2021-07-24 [2] CRAN (R 4.2.0) #> vctrs 0.4.1 2022-04-13 [2] CRAN (R 4.2.0) #> withr 2.5.0 2022-03-03 [2] CRAN (R 4.2.0) #> xfun 0.31 2022-05-10 [1] CRAN (R 4.2.0) #> xml2 1.3.3 2021-11-30 [2] CRAN (R 4.2.0) #> yaml 2.3.5 2022-02-21 [2] CRAN (R 4.2.0) #> #> [1] /Users/soltoffbc/Library/R/arm64/4.2/library #> [2] /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```

rtweet version

packageVersion("rtweet")
#> [1] '1.0.2'

Created on 2022-08-10 by the reprex package (v2.0.1.9000)

Session info

See above.

llrs commented 2 years ago

Many thanks for the detailed report. Note that with the bind_rows you miss users information when merging the tweets (I know that in this example is not important but just in case someone else is trying to use it).

This was supported and only documented in the examples and I missed it, sorry. I don't understand the process under the hood of ggplot but I don't think it is a good idea to keep this because it relies on the position of the column to color and choose the line type, instead of being configurable by the user or inherited from...
However, I'll keep it for the moment.

bensoltoff commented 2 years ago

I don't understand the process under the hood of ggplot but I don't think it is a good idea to keep this because it relies on the position of the column to color and choose the line type, instead of being configurable by the user or inherited from...

My understanding is that you use ts_data() to prepare the dataset for plotting. That function checks if the input data frame is grouped, and if so creates the grouping column as the third (and fourth) columns in the resulting data frame. The name of the grouping columns is irrelevant since you hardcode its position as the third/fourth columns in the data frame. Admittedly it requires the user to group the data frame before ts_plot(), and if the user has two grouping variables they have to be ordered so that the first grouping column identifies the color and the second grouping column identifies the linetype. So I guess it's not the most intuitive, but I don't think its inherently problematic.

llrs commented 2 years ago

ts_data doesn't group the same as group_by as there isn't a direct dependency to dplyr. ts_data counts the tweets by time but without using group_by. As a user it is hard to understand what happens if it is not well described