tidyverts / fabletools

General fable features useful for extension packages
http://fabletools.tidyverts.org/
89 stars 31 forks source link

Make features return different classes #308

Closed atsyplenkov closed 3 years ago

atsyplenkov commented 3 years ago

I'm trying to create my own feature, which I want to return both POSIXct and numeric values. I discovered the only workaround is to use the unnest() after the features function. Can we simplify this workflow? I want to include features in my package, and I don't want to make users applying unnest every time after features.

Sample code

library(feasts, quietly = T, warn.conflicts = F)
library(dplyr, quietly = T, warn.conflicts = F)
library(tidyr, warn.conflicts = F, quietly = T)

  df <- data.frame(datetime = seq(c(ISOdate(2000,3,20)),
                                  by = "hour",
                                  length.out = 30),
                   group = c(rep("A", 15), rep("B", 15)),
                   x = rnorm(30))

  feat_event <- function(x){

    start <- dplyr::first(x)
    end <- dplyr::last(x)
    length <- difftime(end, start, units = "h")

    output <- list(
      start = unname(start),
      end = unname(end),
      length = unname(length)
    )

    output
  }

  feat_event(df$datetime)
#> $start
#> [1] "2000-03-20 12:00:00 GMT"
#> 
#> $end
#> [1] "2000-03-21 17:00:00 GMT"
#> 
#> $length
#> Time difference of 29 hours

  df %>% 
    as_tsibble(key = group, index = datetime) %>% 
    features(datetime, feat_event) %>% 
    unnest()
#> Warning: `cols` is now required when using unnest().
#> Please use `cols = c(start, end, length)`
#> # A tibble: 2 x 4
#>   group start               end                 length  
#>   <chr> <dttm>              <dttm>              <drtn>  
#> 1 A     2000-03-20 12:00:00 2000-03-21 02:00:00 14 hours
#> 2 B     2000-03-21 03:00:00 2000-03-21 17:00:00 14 hours

Created on 2021-03-17 by the reprex package (v1.0.0)

mitchelloharawild commented 3 years ago

Thanks for your issue, I agree with you here. I think it would be better for feature functions to return a tibble (or named list for efficiency). I didn't consider non-numerical features when designing features(), but I think it is an important feature. I would have also run into this limitation when addressing https://github.com/tidyverts/feasts/issues/126. I will extend features() to support list outputs (and equivalently, tibbles outputs) as you have described above.

mitchelloharawild commented 3 years ago

Returning a tibble in your feature function output should now work (it has for a while now, but I missed replying to this issue when it was added).

library(feasts, quietly = T, warn.conflicts = F)
library(dplyr, quietly = T, warn.conflicts = F)
library(tidyr, warn.conflicts = F, quietly = T)

df <- data.frame(datetime = seq(c(ISOdate(2000,3,20)),
                                by = "hour",
                                length.out = 30),
                 group = c(rep("A", 15), rep("B", 15)),
                 x = rnorm(30))

feat_event <- function(x){

  start <- dplyr::first(x)
  end <- dplyr::last(x)
  length <- difftime(end, start, units = "h")

  tibble(
    start = unname(start),
    end = unname(end),
    length = unname(length)
  )
}

feat_event(df$datetime)
#> # A tibble: 1 × 3
#>   start               end                 length  
#>   <dttm>              <dttm>              <drtn>  
#> 1 2000-03-20 12:00:00 2000-03-21 17:00:00 29 hours

df %>% 
  as_tsibble(key = group, index = datetime) %>% 
  features(datetime, feat_event) 
#> # A tibble: 2 × 4
#>   group start               end                 length  
#>   <chr> <dttm>              <dttm>              <drtn>  
#> 1 A     2000-03-20 12:00:00 2000-03-21 02:00:00 14 hours
#> 2 B     2000-03-21 03:00:00 2000-03-21 17:00:00 14 hours

Created on 2021-09-16 by the reprex package (v2.0.0)