tidyverts / fabletools

General fable features useful for extension packages
http://fabletools.tidyverts.org/
89 stars 31 forks source link

features tagging system #81

Closed earowang closed 5 years ago

earowang commented 5 years ago

As discussed on Friday, we'll have a features tagging system. But now I'm thinking the interface should be features_set(tags, envs).

Features live in not only packages but also global environment where users defined their own feature functions.

mitchelloharawild commented 5 years ago

Moved to fablelite, where features and tagging are implemented.

mitchelloharawild commented 5 years ago

Features are registered using register_feature(), which allows users to bind features from their global environment. https://github.com/tidyverts/feasts/blob/98d906d0d679d2f18d92d3548ea4e32160a362c7/R/zzz.R#L7-L9

This usage is supported, although I doubt it will be commonly used.

earowang commented 5 years ago

how features_set() used?

mitchelloharawild commented 5 years ago
library(feasts)
#> Loading required package: fablelite
#> 
#> Attaching package: 'feasts'
#> The following object is masked from 'package:grDevices':
#> 
#>     X11

as_tsibble(USAccDeaths) %>% 
  features(log(value), feature_set(tags = "autocorrelation"))
#> # A tibble: 1 x 11
#>   x_acf1 x_acf10 diff1_acf1 diff1_acf10 diff2_acf1 diff2_acf10 seas_acf1
#>    <dbl>   <dbl>      <dbl>       <dbl>      <dbl>       <dbl>     <dbl>
#> 1  0.698    1.22    0.00613       0.280     -0.494       0.781     0.642
#> # … with 4 more variables: x_pacf5 <dbl>, diff1x_pacf5 <dbl>,
#> #   diff2x_pacf5 <dbl>, seas_pacf <dbl>

as_tsibble(USAccDeaths) %>% 
  features(log(value), feature_set(package = "feasts"))
#> # A tibble: 1 x 18
#>   trend_strength seasonal_streng…    spike linearity curvature
#>            <dbl>            <dbl>    <dbl>     <dbl>     <dbl>
#> 1          0.794            0.944 1.61e-10    -0.228     0.304
#> # … with 13 more variables: seasonal_peak_year <dbl>,
#> #   seasonal_trough_year <dbl>, x_acf1 <dbl>, x_acf10 <dbl>,
#> #   diff1_acf1 <dbl>, diff1_acf10 <dbl>, diff2_acf1 <dbl>,
#> #   diff2_acf10 <dbl>, seas_acf1 <dbl>, x_pacf5 <dbl>, diff1x_pacf5 <dbl>,
#> #   diff2x_pacf5 <dbl>, seas_pacf <dbl>

Created on 2019-06-09 by the reprex package (v0.2.1)

earowang commented 5 years ago

Did you say can be multiple packages? Can we rename package to pkgs instead?

earowang commented 5 years ago

What does x_ prefix indicate?

mitchelloharawild commented 5 years ago

Yes, multiple packages are supported - I can rename this arg.

x_ prefix is defined in acf_features and pacf_features from the tsfeatures package. It is used to differentiate ACF values on the data (x), the first differences (diff1) and the seasonal differences (seas).

Not my choice - happy to change.

earowang commented 5 years ago

Without x_, these names are still unique, aren't they?

mitchelloharawild commented 5 years ago

Correct.

earowang commented 5 years ago

Can we remove the prefix then

mitchelloharawild commented 5 years ago

Done. There are many unusual choices made in individual feature functions.

earowang commented 5 years ago

Can I use log() in scoped variants?

as_tsibble(USAccDeaths) %>% 
  features_at(log(value), feature_set(tags = "autocorrelation"))
mitchelloharawild commented 5 years ago

Nope. Scoped variants use tidyselect semantics, as is similar with summarise_at.

earowang commented 5 years ago

But you'd like to keep log() in features()?

mitchelloharawild commented 5 years ago

Yes - it's very useful for quickly exploring your data.

For example, the workflow for identifying the differences for making a stationary time series computing features on various differences.

earowang commented 5 years ago

I'm okay with it, just reminding the inconsistency. Can you print out a list of output names using feat_available(), maybe in a separate issue to check if names are okay?

mitchelloharawild commented 5 years ago

Have a look at the docs for feature_set when feasts is loaded. I think it's better to keep this functionality in the docs, as it will allow better linking to the feature's documentation.

I'm planning on adding docs for features_by_package and features_by_tag, which will be linked to via features() and feature_set(). Currently only features_by_package is implemented, and is shown in the ?feature_set docs under the "Features" section.

mitchelloharawild commented 5 years ago

For your easy viewing: image

earowang commented 5 years ago

Use feat_* or features_*?

mitchelloharawild commented 5 years ago

feat_*, haven't changed yet.

earowang commented 5 years ago

I'm talking about the feature names not function names.

mitchelloharawild commented 5 years ago

I'm confused by what you mean here.

earowang commented 5 years ago

I mean output names

#> # A tibble: 1 x 18
#>   trend_strength seasonal_streng…    spike linearity curvature
#>            <dbl>            <dbl>    <dbl>     <dbl>     <dbl>
#> 1          0.794            0.944 1.61e-10    -0.228     0.304
#> # … with 13 more variables: seasonal_peak_year <dbl>,
#> #   seasonal_trough_year <dbl>, x_acf1 <dbl>, x_acf10 <dbl>,
#> #   diff1_acf1 <dbl>, diff1_acf10 <dbl>, diff2_acf1 <dbl>,
#> #   diff2_acf10 <dbl>, seas_acf1 <dbl>, x_pacf5 <dbl>, diff1x_pacf5 <dbl>,
#> #   diff2x_pacf5 <dbl>, seas_pacf <dbl>
mitchelloharawild commented 5 years ago

Prefix everything by feat_? Why? Seems far too verbose.

earowang commented 5 years ago

No, that was talking about the functions

mitchelloharawild commented 5 years ago

If people want to add prefixes, they can do so using the list names.

library(feasts)
#> Loading required package: fablelite
#> 
#> Attaching package: 'feasts'
#> The following object is masked from 'package:grDevices':
#> 
#>     X11

as_tsibble(USAccDeaths) %>% 
  features(log(value), list(feat = features_stl))
#> # A tibble: 1 x 7
#>   feat_trend_stre… feat_seasonal_s… feat_spike feat_linearity
#>              <dbl>            <dbl>      <dbl>          <dbl>
#> 1            0.794            0.944   1.61e-10         -0.228
#> # … with 3 more variables: feat_curvature <dbl>,
#> #   feat_seasonal_peak_year <dbl>, feat_seasonal_trough_year <dbl>

Created on 2019-06-09 by the reprex package (v0.2.1)

earowang commented 5 years ago

I'd like see the default column names for all available features.

earowang commented 5 years ago

I'm closing this.

mitchelloharawild commented 5 years ago

Do you mean as a user, or for thinking about the problem now as a developer?

If needed for the user, I think it should be detailed in ?features_stl. If you need it, I'd look through the final line of each function in https://github.com/tidyverts/feasts/blob/master/R/features.R and https://github.com/tidyverts/feasts/blob/master/R/hctsa_features.R

earowang commented 5 years ago

I mean general: output names are informative or not.

For example arch_lm -> rsquared_arch.

We don't have feat_all() to obtain all the features?

mitchelloharawild commented 5 years ago

feat_all == feature_set, as the feature_set(pkgs = NULL, tags = NULL) is default.

mitchelloharawild commented 5 years ago

I mean general: output names are informative or not.

For example arch_lm -> rsquared_arch.

We don't have feat_all() to obtain all the features?

Regarding arch_lm, I've renamed it to stat_arch_lm. I don't think rsquared_arch is appropriate as the rsquared is more about implementation. It is a statistic for the LM test for ARCH.