Closed zenggyu closed 5 years ago
While it is useful to be able to easily extract the data used in gg_*()
, I think that it should return a tsibble
/tibble
, rather than something with a new class that can be passed into autoplot()
. Although I wasn't aware of this usage in yardstick
, so I'll definitely look into this more. One thing I definitely want to avoid is functions that produce both data and a plot (such as stats::acf()
).
As you mention for the ACF, it would be nice to also contain the dashed lines in the data. I completely agree with you here and think that the ACF object should contain a distribution column. I've mentioned this briefly in issue #1. Note that because of this added complexity/structure in the ACF object, I'm using a new class here to support the autoplot()
method. I'm still questioning if this is the correct approach.
For the case of gg_subseries()
and gg_season()
this manipulation should be fairly simple (as shown below). gg_lag()
is a bit more difficult, but it would be analogous to stats::embed()
for a tsibble. This would be something that I think is better suited for the tsibble package by @earowang.
gg_subseries()
library(feasts)
library(dplyr)
tsibbledata::aus_production %>%
gg_subseries(Beer)
tsibbledata::aus_production %>%
transmute(Beer, facet = quarters(Quarter))
#> # A tsibble: 218 x 3 [1Q]
#> Quarter Beer facet
#> <qtr> <dbl> <chr>
#> 1 1956 Q1 284 Q1
#> 2 1956 Q2 213 Q2
#> 3 1956 Q3 227 Q3
#> 4 1956 Q4 308 Q4
#> 5 1957 Q1 262 Q1
#> 6 1957 Q2 228 Q2
#> 7 1957 Q3 236 Q3
#> 8 1957 Q4 320 Q4
#> 9 1958 Q1 272 Q1
#> 10 1958 Q2 233 Q2
#> # … with 208 more rows
Created on 2019-07-22 by the reprex package (v0.3.0)
gg_season()
library(feasts)
library(dplyr)
tsibbledata::aus_production %>%
gg_season(Beer)
tsibbledata::aus_production %>%
transmute(Beer, colour = lubridate::year(Quarter))
#> # A tsibble: 218 x 3 [1Q]
#> Quarter Beer colour
#> <qtr> <dbl> <dbl>
#> 1 1956 Q1 284 1956
#> 2 1956 Q2 213 1956
#> 3 1956 Q3 227 1956
#> 4 1956 Q4 308 1956
#> 5 1957 Q1 262 1957
#> 6 1957 Q2 228 1957
#> 7 1957 Q3 236 1957
#> 8 1957 Q4 320 1957
#> 9 1958 Q1 272 1958
#> 10 1958 Q2 233 1958
#> # … with 208 more rows
Created on 2019-07-22 by the reprex package (v0.3.0)
You can of course also access the plot data from a ggplot object using ggplot2::layer_data()
:smile:
feasts
providesgg_lag()
,gg_subseries()
, etc. that returns various plots directly. This is quite convenient, but it also limits the power of users to customize the visualization. I think it would be nice if the package can provide some intermediate functions that returns a dataframe which contains the processed data needed to make the plot; and then users can visualize the plot using self-definedggplot()
statements or using the generic functionautoplot()
; then,gg_lag()
,gg_subseries()
can be removed.Similar ideas have been implemented in
yardstick::roc_curve()
,yardstick::pr_curve()
, as well asfeasts::ACF()
. Besides main data which can be stored in a dataframe, additional information (e.g., data required to plot the dashed lines in an ACF plot; additional class information which indicates howautoplot()
should plot the data) can be stored as attributes of the dataframe.I believe the above proposal can make the package more powerful and consistent with the tidyverse, what do you think?