tidyverts / feasts

Feature Extraction And Statistics for Time Series
https://feasts.tidyverts.org/
291 stars 23 forks source link

Any way to adjust the "phase" of a gg_season graph? #133

Closed MetricT closed 2 years ago

MetricT commented 3 years ago

I am trying to use gg_season to graph seasonal gun sales with a period of 4 years, to extract seasonal gun sale data aligned with the US Presidential election cycle. But when I graph the data with gg_season, it starts with year 4 instead of year 1. Is there any way to adjust the phase of the data to ensure the graph starts with year 1?

Also, when graphed, the x-axis headers are all years in 1972. I'm sure I could brute-force change them, but I'm wondering if there's a better way to do it within gg_season, ie if the user specifies season with a period of 4 years, is there a way for gg_season to automatically add "year 1", "year 2" for x axis breaks instead of years? Thanks.

library(tabulizer) library(tidyverse) library(lubridate) library(tsibble) library(feasts)

NICS Firearm checks by month. Use tabulizer's "stream" instead of

"lattice" to import the table correctly.

pdf <- "https://www.fbi.gov/file-repository/nics_firearm_checks_-_month_year.pdf" out <- extract_tables(pdf, method = "stream")

data <- out[[1]][-1, ] %>% as_tibble() %>% rename(Year = V1, Jan = V2, Feb = V3, Mar = V4, Apr = V5, May = V6, Jun = V7, Jul = V8, Aug = V9, Sep = V10, Oct = V11, Nov = V12, Dec = V13, Total = V14) %>% select(-Total) %>% mutate( Jan = as.numeric(gsub(",", "", gsub("\.", "", Jan))), Feb = as.numeric(gsub(",", "", gsub("\.", "", Feb))), Mar = as.numeric(gsub(",", "", gsub("\.", "", Mar))), Apr = as.numeric(gsub(",", "", gsub("\.", "", Apr))), May = as.numeric(gsub(",", "", gsub("\.", "", May))), Jun = as.numeric(gsub(",", "", gsub("\.", "", Jun))), Jul = as.numeric(gsub(",", "", gsub("\.", "", Jul))), Aug = as.numeric(gsub(",", "", gsub("\.", "", Aug))), Sep = as.numeric(gsub(",", "", gsub("\.", "", Sep))), Oct = as.numeric(gsub(",", "", gsub("\.", "", Oct))), Nov = as.numeric(gsub(",", "", gsub("\.", "", Nov))), Dec = as.numeric(gsub(",", "", gsub("\.", "", Dec)))) %>% pivot_longer(-Year, names_to = "Month", values_to = "NICS_Firearm_Background_Checks") %>% filter(!is.na(NICS_Firearm_Background_Checks)) %>% mutate(yearmonth = yearmonth(paste(Year, " ", Month))) %>% select(yearmonth, NICS_Firearm_Background_Checks)

model <- data %>% as_tsibble(index = "yearmonth") %>% model(STL(NICS_Firearm_Background_Checks ~ trend(window = 90) + season(period = "1 year") + season(period = "4 year"))) %>% components()

gg_season(model, y = season_4 year - min(model$season_4 year), period = "4 year") + labs(x = "", y = "", title = "4-Year Seasonal Component of Gun Checks") + scale_y_continuous(labels = scales::comma)

mitchelloharawild commented 3 years ago

Thanks for the interesting application and code. I've had some trouble running the code and so I have made some small improvements:

library(tabulizer)
library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following object is masked from 'package:lubridate':
#> 
#>     interval
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, union
library(feasts)
#> Loading required package: fabletools
pdf <- "https://www.fbi.gov/file-repository/nics_firearm_checks_-_month_year.pdf"
out <- extract_tables(pdf, method = "stream")

data <- out[[1]][-1, ] %>%
  as_tibble() %>%
  rename(Year = V1, Jan = V2, Feb = V3, Mar = V4, Apr = V5, May = V6,
  Jun = V7, Jul = V8, Aug = V9, Sep = V10, Oct = V11,
  Nov = V12, Dec = V13, Total = V14) %>%
  select(-Total) %>%
  mutate(
    across(month.abb, ~ as.numeric(gsub(",", "", gsub("\\.", "", .))))
  ) %>% 
  pivot_longer(-Year, names_to = "Month", values_to = "NICS_Firearm_Background_Checks") %>%
  filter(!is.na(NICS_Firearm_Background_Checks)) %>%
  mutate(yearmonth = yearmonth(paste(Year, " ", Month))) %>%
  select(yearmonth, NICS_Firearm_Background_Checks)
#> Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if `.name_repair` is omitted as of tibble 2.0.0.
#> Using compatibility `.name_repair`.
#> Note: Using an external vector in selections is ambiguous.
#> ℹ Use `all_of(month.abb)` instead of `month.abb` to silence this message.
#> ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.

model <- data %>%
  as_tsibble(index = "yearmonth") %>%
  model(STL(NICS_Firearm_Background_Checks ~ trend(window = 90) + season(period = "1 year") + season(period = "4 year"))) %>%
  components()

gg_season(model, y = `season_4 year` - min(`season_4 year`), period = "4 year") +
  labs(x = "", y = "", title = "4-Year Seasonal Component of Gun Checks") +
  scale_y_continuous(labels = scales::comma)

Created on 2021-05-06 by the reprex package (v1.0.0)

Regarding the x-axis, the correct behaviour would be to use an origin-less time variable as you suggest. Something like 'year 1', 'year 2', etc. Better time structures in R would be the correct approach here, and is related to the issue #103. I'm hoping that https://github.com/mitchelloharawild/moment or https://github.com/r-lib/clock can improve time structures in R to make this possible, but it will take a while for these ideas to establish.

mitchelloharawild commented 2 years ago

I'm closing this as I think it's out of scope for gg_season(), and it is more appropriate to code up explicitly in ggplot2. Changing the phase of seasonality would require specifying more than just the seasonal period, which is an interesting idea that doesn't show up in modelling very often (usually the length/period of the seasonality is sufficient).