tidyverts / fabletools

General fable features useful for extension packages
http://fabletools.tidyverts.org/
89 stars 31 forks source link

Behaviour of `features` with functions that return argument > length 1 #258

Closed njtierney closed 4 years ago

njtierney commented 4 years ago

I've noticed that when passing features a function that outputs something of lenght > 1, you get a long warning of renaming - I just wanted to check if this was expected behavhouiur? It's quite cool that this just works! I can get as many columns as there are outputs from the function. But just wanted to raise it as I hasn't seen it documented.

library(brolgar)
# Passing a vector to `mean` vs `diff` 
# mean
mean(c(1:10))
#> [1] 5.5
# input = vector of length 10
# output = vector of lenght 1

diff(c(1:10))
#> [1] 1 1 1 1 1 1 1 1 1
# input = vector of length 10
# input = vector of length 9

# so consequently, passing `features` a function that produces a vector
# will give you quite different return formats.

# mean
wages %>%
  features(ln_wages, 
           list(mean = mean))
#> # A tibble: 888 x 2
#>       id  mean
#>    <int> <dbl>
#>  1    31  1.75
#>  2    36  2.33
#>  3    53  1.89
#>  4   122  2.17
#>  5   134  2.48
#>  6   145  1.76
#>  7   155  2.17
#>  8   173  1.93
#>  9   206  2.27
#> 10   207  2.11
#> # … with 878 more rows

# range
wages %>%
  features(ln_wages, 
           list(range = range))
#> New names:
#> * `.?` -> `.?...1`
#> * `.?` -> `.?...2`
#> New names:
#> * `diff_.?...12` -> `diff_.?...13`
#> New names:
#> * `diff_.?...1` -> `diff_.?...2`
#> * `diff_.?...2` -> `diff_.?...3`
#> * `diff_.?...3` -> `diff_.?...4`
#> * `diff_.?...4` -> `diff_.?...5`
#> * `diff_.?...5` -> `diff_.?...6`
#> * ...
#> # A tibble: 888 x 14
#>       id `diff_.?...2` `diff_.?...3` `diff_.?...4` `diff_.?...5` `diff_.?...6`
#>    <int>         <dbl>         <dbl>         <dbl>         <dbl>         <dbl>
#>  1    31        -0.058         0.036         0.28          0.182        -0.222
#>  2    36        -0.184         0.458         0.317        -0.754         1.11 
#>  3    53        -0.225         1.70         -1.65         -0.03          0.316
#>  4   122         0.806        -1.00         -1.16          1.68         -0.263
#>  5   134         0.095         0.224         0.064        -0.124         0.063
#>  6   145        -0.086         0.12          0.289         0.151        -0.265
#>  7   155         0.302        -0.543         0.627         0.158        -0.593
#>  8   173         0.134         0.283         0.051        -0.011         0.319
#>  9   206         0.269         0.185        NA            NA            NA    
#> 10   207         0.169         0.182        -0.068         0.361        -0.172
#> # … with 878 more rows, and 8 more variables: `diff_.?...7` <dbl>,
#> #   `diff_.?...8` <dbl>, `diff_.?...9` <dbl>, `diff_.?...10` <dbl>,
#> #   `diff_.?...11` <dbl>, `diff_.?...12` <dbl>, diff <dbl>,
#> #   `diff_.?...14` <dbl>

Created on 2020-08-19 by the reprex package (v0.3.0)

mitchelloharawild commented 4 years ago

All values of the vector needs names if multiple values will be returned. There shouldn't be this many warnings (one per column is appropriate), but this is how I would use range():

library(brolgar)
# range
wages %>%
  features(ln_wages, 
           list(range = ~ setNames(range(.), c("min", "max"))))
#> # A tibble: 888 x 3
#>       id range_min range_max
#>    <int>     <dbl>     <dbl>
#>  1    31     1.43       2.13
#>  2    36     1.80       2.93
#>  3    53     1.54       3.24
#>  4   122     0.763      2.92
#>  5   134     2.00       2.93
#>  6   145     1.48       2.04
#>  7   155     1.54       2.64
#>  8   173     1.56       2.34
#>  9   206     2.03       2.48
#> 10   207     1.58       2.66
#> # … with 878 more rows

Created on 2020-08-20 by the reprex package (v0.3.0)

njtierney commented 4 years ago

Neat! Thanks for that! So there isn't currently a way to specify names within features?

Also if it is helpful I'd be happy to document this behaviour in fabletools, since I'll be doing it for brolgar as well

mitchelloharawild commented 4 years ago

More documentation is always welcomed.

How do you mean by 'specify names within features'? Do you have a proposed interface improvement?