tidyverts / feasts

Feature Extraction And Statistics for Time Series
https://feasts.tidyverts.org/
295 stars 23 forks source link

How to create your own function to pass to feasts? #63

Closed njtierney closed 5 years ago

njtierney commented 5 years ago

Hello!

I'm interested in writing my own functions to pass to feasts, but I feel like I am missing something on how to create the function I want to pass to features. Here's a reprex:

library(feasts)
#> Loading required package: fablelite
#> 
#> Attaching package: 'feasts'
#> The following object is masked from 'package:grDevices':
#> 
#>     X11
library(tsibbledata)
library(tsibble)
library(brolgar)

aus_retail %>% 
  features(Turnover, feat_acf) 
#> # A tibble: 152 x 9
#>    State Industry  acf1 acf10 diff1_acf1 diff1_acf10 diff2_acf1 diff2_acf10
#>    <chr> <chr>    <dbl> <dbl>      <dbl>       <dbl>      <dbl>       <dbl>
#>  1 Aust… Cafes, … 0.972  8.59     -0.347       0.244     -0.573       0.513
#>  2 Aust… Cafes, … 0.977  8.67     -0.327       0.259     -0.553       0.534
#>  3 Aust… Clothin… 0.892  7.18     -0.288       0.260     -0.510       0.341
#>  4 Aust… Clothin… 0.858  6.53     -0.319       0.212     -0.538       0.326
#>  5 Aust… Departm… 0.504  1.65     -0.316       0.206     -0.541       0.317
#>  6 Aust… Electri… 0.909  7.42     -0.261       0.324     -0.508       0.450
#>  7 Aust… Food re… 0.987  9.22     -0.413       0.616     -0.613       1.10 
#>  8 Aust… Footwea… 0.779  4.82     -0.349       0.168     -0.576       0.344
#>  9 Aust… Furnitu… 0.956  7.77     -0.200       0.164     -0.531       0.395
#> 10 Aust… Hardwar… 0.957  7.85     -0.117       0.104     -0.501       0.314
#> # … with 142 more rows, and 1 more variable: season_acf1 <dbl>

# let's try some ts data from brolgar
wages_ts <- as_tsibble(wages,
                       key = id,
                       index = exper,
                       regular = FALSE)
wages_ts %>% 
  features(exper, feat_acf)
#> # A tibble: 888 x 8
#>       id    acf1 acf10 diff1_acf1 diff1_acf10 diff2_acf1 diff2_acf10
#>    <int>   <dbl> <dbl>      <dbl>       <dbl>      <dbl>       <dbl>
#>  1    31  0.643  NA       0.0496       0.139     -0.0829       0.292
#>  2    36  0.732  NA       0.200        0.500     -0.172        0.263
#>  3    53  0.233  NA      -0.00102      0.0940    -0.0490       0.176
#>  4   122  0.726  NA      -0.361        0.850     -0.693        1.44 
#>  5   134  0.762   1.61   -0.0993       0.301     -0.498        0.547
#>  6   145  0.704  NA      -0.256        0.491     -0.488        0.670
#>  7   155  0.740   1.51    0.106        0.238     -0.289        0.161
#>  8   173  0.558  NA      -0.123        0.355     -0.190        0.131
#>  9   206 -0.0115 NA      -0.5          0.25      NA            0    
#> 10   207  0.731   1.48    0.106        0.436     -0.338        0.335
#> # … with 878 more rows, and 1 more variable: season_acf1 <dbl>

# now I want to return the mean for each key
wages_ts %>% 
  features(exper, mean)
#> Error: Argument 1 must have names

Created on 2019-07-05 by the reprex package (v0.2.1)

I am probably missing something, but perhaps it might be useful to have some helpers around creating/validating feature functions? Perhaps something like:

Once I understand this I'd be happy to contribute a vignette or something to explain how to create new features, if you like?

earowang commented 5 years ago

The features() require names for the function, which probably isn't necessary? Also another issue is that the function doesn't get computed.

library(feasts)
#> Loading required package: fablelite
#> 
#> Attaching package: 'feasts'
#> The following object is masked from 'package:grDevices':
#> 
#>     X11
tsibbledata::aus_retail %>% 
  features(Turnover, average = ~ mean)
#> # A tibble: 152 x 2
#>    State                  Industry                                         
#>    <chr>                  <chr>                                            
#>  1 Australian Capital Te… Cafes, restaurants and catering services         
#>  2 Australian Capital Te… Cafes, restaurants and takeaway food services    
#>  3 Australian Capital Te… Clothing retailing                               
#>  4 Australian Capital Te… Clothing, footwear and personal accessory retail…
#>  5 Australian Capital Te… Department stores                                
#>  6 Australian Capital Te… Electrical and electronic goods retailing        
#>  7 Australian Capital Te… Food retailing                                   
#>  8 Australian Capital Te… Footwear and other personal accessory retailing  
#>  9 Australian Capital Te… Furniture, floor coverings, houseware and textil…
#> 10 Australian Capital Te… Hardware, building and garden supplies retailing 
#> # … with 142 more rows

Created on 2019-07-08 by the reprex package (v0.3.0)

njtierney commented 5 years ago

It doesn't seem clear to me how names are needed or used, since the following:

names(feasts::feat_acf)
#> NULL

feat_acf doesn't have names?

But I could create a list of funs with names like so, which doesn't work as I might expect it to.

library(feasts)
#> Loading required package: fablelite
#> 
#> Attaching package: 'feasts'
#> The following object is masked from 'package:grDevices':
#> 
#>     X11
funs_list <- list(avg = mean,
                  sd = sd)

tsibbledata::aus_retail %>% 
  features(Turnover, funs_list)
#> Error: Argument 1 must have names

Created on 2019-07-08 by the reprex package (v0.2.1)

mitchelloharawild commented 5 years ago

The latter would be a nice feature to add. The names refer to the object returned by the fn().

names(feasts::feat_acf(rnorm(10)))
#> [1] "acf1"        "acf10"       "diff1_acf1"  "diff1_acf10" "diff2_acf1" 
#> [6] "diff2_acf10"

Created on 2019-07-08 by the reprex package (v0.3.0)

mitchelloharawild commented 5 years ago

I agree that names may not be necessary. @earowang, note that features are passed as a list to the features argument:

library(fablelite)
tsibbledata::aus_retail %>% 
  features(Turnover, features = list(~ set_names(mean(.), "mean")))
#> # A tibble: 152 x 2
#>    State                  Industry                                         
#>    <chr>                  <chr>                                            
#>  1 Australian Capital Te… Cafes, restaurants and catering services         
#>  2 Australian Capital Te… Cafes, restaurants and takeaway food services    
#>  3 Australian Capital Te… Clothing retailing                               
#>  4 Australian Capital Te… Clothing, footwear and personal accessory retail…
#>  5 Australian Capital Te… Department stores                                
#>  6 Australian Capital Te… Electrical and electronic goods retailing        
#>  7 Australian Capital Te… Food retailing                                   
#>  8 Australian Capital Te… Footwear and other personal accessory retailing  
#>  9 Australian Capital Te… Furniture, floor coverings, houseware and textil…
#> 10 Australian Capital Te… Hardware, building and garden supplies retailing 
#> # … with 142 more rows

Created on 2019-07-08 by the reprex package (v0.3.0)

earowang commented 5 years ago

So where's the "mean"?

earowang commented 5 years ago

Does this look better?

tsibbledata::aus_retail %>% 
  features(Turnover, features = list(average = ~ mean(.)))
mitchelloharawild commented 5 years ago

Good question, works interactively but not in reprex! Hmmm.

edit: something else loaded by load_all is required, perhaps a namespace issue.

mitchelloharawild commented 5 years ago

Yes, list(average = ~ mean(.)) is not supported currently, but I think it should be. Working on this now. The best way (once implemented), would be list(average = mean).

mitchelloharawild commented 5 years ago

Fixed, user error.

library(fablelite)

tsibbledata::aus_retail %>% 
  features(Turnover, features = list(~ rlang::set_names(mean(.), "mean")))
#> # A tibble: 152 x 3
#>    State                Industry                                       mean
#>    <chr>                <chr>                                         <dbl>
#>  1 Australian Capital … Cafes, restaurants and catering services      20.0 
#>  2 Australian Capital … Cafes, restaurants and takeaway food services 32.0 
#>  3 Australian Capital … Clothing retailing                            12.4 
#>  4 Australian Capital … Clothing, footwear and personal accessory re… 19.8 
#>  5 Australian Capital … Department stores                             24.9 
#>  6 Australian Capital … Electrical and electronic goods retailing     20.0 
#>  7 Australian Capital … Food retailing                                97.7 
#>  8 Australian Capital … Footwear and other personal accessory retail…  7.38
#>  9 Australian Capital … Furniture, floor coverings, houseware and te… 15.1 
#> 10 Australian Capital … Hardware, building and garden supplies retai… 13.0 
#> # … with 142 more rows

Created on 2019-07-08 by the reprex package (v0.3.0)

njtierney commented 5 years ago

OK, so do you think you will provide a way to construct a feature list? Or do you think it will go to something like this:

tsibbledata::aus_retail %>% 
  features(Turnover, 
           features = list(avg = mean))
mitchelloharawild commented 5 years ago

I think this is easy enough.

library(fablelite)

tsibbledata::aus_retail %>% 
  features(Turnover, features = list(a = mean, b = feasts::feat_acf))
#> # A tibble: 152 x 10
#>    State Industry     a b_acf1 b_acf10 b_diff1_acf1 b_diff1_acf10
#>    <chr> <chr>    <dbl>  <dbl>   <dbl>        <dbl>         <dbl>
#>  1 Aust… Cafes, … 20.0   0.973    8.59       -0.348         0.239
#>  2 Aust… Cafes, … 32.0   0.977    8.65       -0.327         0.259
#>  3 Aust… Clothin… 12.4   0.885    7.01       -0.276         0.251
#>  4 Aust… Clothin… 19.8   0.846    6.33       -0.303         0.201
#>  5 Aust… Departm… 24.9   0.500    1.60       -0.310         0.202
#>  6 Aust… Electri… 20.0   0.902    7.29       -0.247         0.324
#>  7 Aust… Food re… 97.7   0.984    9.13       -0.394         0.585
#>  8 Aust… Footwea…  7.38  0.760    4.64       -0.325         0.155
#>  9 Aust… Furnitu… 15.1   0.952    7.67       -0.190         0.163
#> 10 Aust… Hardwar… 13.0   0.957    7.67       -0.104         0.101
#> # … with 142 more rows, and 3 more variables: b_diff2_acf1 <dbl>,
#> #   b_diff2_acf10 <dbl>, b_season_acf1 <dbl>

Created on 2019-07-08 by the reprex package (v0.3.0)

mitchelloharawild commented 5 years ago

We also have fablelite::feature_set() to create a list of features based on tags.

njtierney commented 5 years ago

Is this with a new version of feasts/fablelite? I get:

library(fablelite)

tsibbledata::aus_retail %>% 
  features(Turnover, features = list(a = mean, b = feasts::feat_acf))
#> Error: Argument 1 must have names

Created on 2019-07-08 by the reprex package (v0.2.1)

mitchelloharawild commented 5 years ago

Yes, new version of fablelite pushed ~5 minutes ago.

njtierney commented 5 years ago

Looks great to me:

library(feasts)
#> Loading required package: fablelite
#> 
#> Attaching package: 'feasts'
#> The following object is masked from 'package:grDevices':
#> 
#>     X11
library(brolgar) # using use-feasts branch

wages_ts %>%
  features(exper, list(diff_range = ~diff(range(.))))
#> # A tibble: 888 x 2
#>       id diff_range
#>    <int>      <dbl>
#>  1    31      6.97 
#>  2    36      9.28 
#>  3    53      0.996
#>  4   122      9.04 
#>  5   134     10.6  
#>  6   145      6.86 
#>  7   155      9.45 
#>  8   173      6.21 
#>  9   206      2.44 
#> 10   207      9.78 
#> # … with 878 more rows

wages_ts %>%
  features(id, list(n_obs = length))
#> # A tibble: 888 x 2
#>       id n_obs
#>    <int> <int>
#>  1    31     8
#>  2    36    10
#>  3    53     8
#>  4   122    10
#>  5   134    12
#>  6   145     9
#>  7   155    11
#>  8   173     6
#>  9   206     3
#> 10   207    11
#> # … with 878 more rows

# nice naming too
wages_ts %>%
  features_at(vars(uerate, exper), 
              list(avg = mean,
                   sd = sd))
#> # A tibble: 888 x 5
#>       id uerate_avg uerate_sd exper_avg exper_sd
#>    <int>      <dbl>     <dbl>     <dbl>    <dbl>
#>  1    31       3.21     0.710      3.38    2.51 
#>  2    36       5.10     1.98       4.90    3.32 
#>  3    53       4.43     1.34       1.11    0.297
#>  4   122       5.30     1.96       6.42    3.20 
#>  5   134       5.72     1.63       5.43    3.59 
#>  6   145       5.20     1.79       3.70    2.51 
#>  7   155       6.87     3.40       5.84    3.22 
#>  8   173       6.08     1.69       3.23    2.54 
#>  9   206       8.83     2.37       3.00    1.23 
#> 10   207       7.42     2.17       5.55    3.27 
#> # … with 878 more rows

# nice naming too
wages_ts %>%
  features_at(tsibble::measured_vars(.), 
              list(avg = mean,
                   sd = sd))
#> # A tibble: 888 x 15
#>       id lnw_avg lnw_sd ged_avg ged_sd postexp_avg postexp_sd black_avg
#>    <int>   <dbl>  <dbl>   <dbl>  <dbl>       <dbl>      <dbl>     <dbl>
#>  1    31    1.75  0.277   1      0           3.38       2.51          0
#>  2    36    2.33  0.387   1      0           4.90       3.32          0
#>  3    53    1.89  0.562   0.75   0.463       0.172      0.274         0
#>  4   122    2.17  0.574   0      0           0          0             0
#>  5   134    2.48  0.321   0.667  0.492       2.31       2.65          0
#>  6   145    1.76  0.185   0.889  0.333       3.36       2.49          0
#>  7   155    2.17  0.362   0      0           0          0             0
#>  8   173    1.93  0.274   0      0           0          0             0
#>  9   206    2.27  0.228   0      0           0          0             0
#> 10   207    2.11  0.327   0      0           0          0             0
#> # … with 878 more rows, and 7 more variables: black_sd <dbl>,
#> #   hispanic_avg <dbl>, hispanic_sd <dbl>, hgc_avg <dbl>, hgc_sd <dbl>,
#> #   uerate_avg <dbl>, uerate_sd <dbl>

Created on 2019-07-08 by the reprex package (v0.2.1)

njtierney commented 5 years ago

This is so great, it will involve re-writing many functions in brolgar, but this flexibility is wonderful.

njtierney commented 5 years ago

Here's my crack at adding do your own features:


To create your own features or summaries to pass to `feasts`, you can provide a named list of functions. For example:

```{r create-three}
library(feasts)
feat_three <- list(min = min,
                   med = median,
                   max = max)

feat_three

These are then passed to features like so:

library(tsibbledata)

aus_retail %>%
  features(Turnover, feat_three)


Somewhat related, I've added a question about `feature_set` here https://github.com/tidyverts/fablelite/issues/89 
mitchelloharawild commented 5 years ago

Sounds good. This is the recommended interface for users.