microsoft / finnts

Microsoft Finance Time Series Forecasting Framework (FinnTS) is a forecasting package that utilizes cutting-edge time series forecasting and parallelization on the cloud to produce accurate forecasts for financial data.
https://microsoft.github.io/finnts
Other
180 stars 32 forks source link

Misalignment of Date Features when fiscal_year_start != 1 #157

Open AndrewKostandy opened 3 months ago

AndrewKostandy commented 3 months ago

Hi,

Thank you for your work on this package. I'm running into an issue with the created Date features when setting fiscal_year_start to a value other than 1. For example, when using afiscal_year_start value of 11 below, I would expect the Date_half and Date_quarter values to be 1 for November. For Date_month it could be either 11 or 1 (understandable either way), but for Date_month.lbl it should be "November". Note that October is in a different quarter than September which is incorrect if the fiscal year start is in November:

library(tidyverse)
library(finnts)
#> Loading required package: modeltime

df <- tibble(
  Date = seq.Date(from = ymd("2020-11-01"), to = ymd("2023-10-01"), by = "month"),
  y = rnorm(36, 100, 5),
  x = rnorm(36, 100, 5),
  id = "y"
)

run_info <- set_run_info(
  experiment_name = "Run 1",
  run_name = "Date Extraction Check"
)
#> Finn Submission Info
#> • Experiment Name: Run 1
#> • Run Name: Date Extraction Check-20240404T110928Z
#> 

prep_data(
  run_info = run_info,
  input_data = df,
  combo_variables = "id",
  target_variable = "y",
  date_type = "month",
  forecast_horizon = 1,
  external_regressors = "x",
  hist_start_date = min(df$Date),
  hist_end_date = max(df$Date),
  fiscal_year_start = 11
)
#> ℹ Prepping Data
#> ✔ Prepping Data [1.8s]
#> 

df_r1_fiscal_11 <- get_prepped_data(run_info = run_info, recipe = "R1") |>
  select(Date, Date_year, Date_half, Date_quarter, Date_month, Date_month.lbl)

df_r1_fiscal_11
#> # A tibble: 37 × 6
#>    Date       Date_year Date_half Date_quarter Date_month Date_month.lbl
#>    <date>         <dbl>     <dbl>        <dbl>      <dbl> <chr>         
#>  1 2020-11-01      2021         2            3          9 September     
#>  2 2020-12-01      2021         2            4         10 October       
#>  3 2021-01-01      2021         2            4         11 November      
#>  4 2021-02-01      2021         2            4         12 December      
#>  5 2021-03-01      2022         1            1          1 January       
#>  6 2021-04-01      2022         1            1          2 February      
#>  7 2021-05-01      2022         1            1          3 March         
#>  8 2021-06-01      2022         1            2          4 April         
#>  9 2021-07-01      2022         1            2          5 May           
#> 10 2021-08-01      2022         1            2          6 June          
#> # ℹ 27 more rows

df_r2_fiscal_11 <- get_prepped_data(run_info = run_info, recipe = "R2") |>
  select(Date, Date_year, Date_half, Date_quarter, Date_month, Date_month.lbl)

df_r2_fiscal_11
#> # A tibble: 37 × 6
#>    Date       Date_year Date_half Date_quarter Date_month Date_month.lbl
#>    <date>         <dbl>     <dbl>        <dbl>      <dbl> <chr>         
#>  1 2020-11-01      2021         2            3          9 September     
#>  2 2020-12-01      2021         2            4         10 October       
#>  3 2021-01-01      2021         2            4         11 November      
#>  4 2021-02-01      2021         2            4         12 December      
#>  5 2021-03-01      2022         1            1          1 January       
#>  6 2021-04-01      2022         1            1          2 February      
#>  7 2021-05-01      2022         1            1          3 March         
#>  8 2021-06-01      2022         1            2          4 April         
#>  9 2021-07-01      2022         1            2          5 May           
#> 10 2021-08-01      2022         1            2          6 June          
#> # ℹ 27 more rows

I would need to change the fiscal_year_start value to 3 to correct my Date_half and Date_quarter values to what is needed when my fiscal_year_start is actually November. Now Date_half and Date_quarter values are 1 for November, December, and January which is correct when the fiscal year starts in November. However, the Date_month.lbl is "January" which is still incorrect:

run_info <- set_run_info(
  experiment_name = "Run 1",
  run_name = "Date Extraction Check"
)
#> Finn Submission Info
#> • Experiment Name: Run 1
#> • Run Name: Date Extraction Check-20240404T110930Z
#> 

prep_data(
  run_info = run_info,
  input_data = df,
  combo_variables = "id",
  target_variable = "y",
  date_type = "month",
  forecast_horizon = 1,
  external_regressors = "x",
  hist_start_date = min(df$Date),
  hist_end_date = max(df$Date),
  fiscal_year_start = 3
)
#> ℹ Prepping Data
#> ✔ Prepping Data [793ms]
#> 

df_r1_fiscal_3 <- get_prepped_data(run_info = run_info, recipe = "R1") |>
  select(Date, Date_year, Date_half, Date_quarter, Date_month, Date_month.lbl)

df_r1_fiscal_3
#> # A tibble: 37 × 6
#>    Date       Date_year Date_half Date_quarter Date_month Date_month.lbl
#>    <date>         <dbl>     <dbl>        <dbl>      <dbl> <chr>         
#>  1 2020-11-01      2021         1            1          1 January       
#>  2 2020-12-01      2021         1            1          2 February      
#>  3 2021-01-01      2021         1            1          3 March         
#>  4 2021-02-01      2021         1            2          4 April         
#>  5 2021-03-01      2021         1            2          5 May           
#>  6 2021-04-01      2021         1            2          6 June          
#>  7 2021-05-01      2021         2            3          7 July          
#>  8 2021-06-01      2021         2            3          8 August        
#>  9 2021-07-01      2021         2            3          9 September     
#> 10 2021-08-01      2021         2            4         10 October       
#> # ℹ 27 more rows

df_r2_fiscal_3 <- get_prepped_data(run_info = run_info, recipe = "R2") |>
  select(Date, Date_year, Date_half, Date_quarter, Date_month, Date_month.lbl)

df_r2_fiscal_3
#> # A tibble: 37 × 6
#>    Date       Date_year Date_half Date_quarter Date_month Date_month.lbl
#>    <date>         <dbl>     <dbl>        <dbl>      <dbl> <chr>         
#>  1 2020-11-01      2021         1            1          1 January       
#>  2 2020-12-01      2021         1            1          2 February      
#>  3 2021-01-01      2021         1            1          3 March         
#>  4 2021-02-01      2021         1            2          4 April         
#>  5 2021-03-01      2021         1            2          5 May           
#>  6 2021-04-01      2021         1            2          6 June          
#>  7 2021-05-01      2021         2            3          7 July          
#>  8 2021-06-01      2021         2            3          8 August        
#>  9 2021-07-01      2021         2            3          9 September     
#> 10 2021-08-01      2021         2            4         10 October       
#> # ℹ 27 more rows
Created on 2024-04-04 with [reprex v2.1.0](https://reprex.tidyverse.org/)
Session Info ```r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.3 (2024-02-29) #> os macOS Sonoma 14.4.1 #> system x86_64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/Toronto #> date 2024-04-04 #> pandoc 3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> anytime 0.3.9 2020-08-27 [1] CRAN (R 4.3.0) #> bit 4.0.5 2022-11-15 [1] CRAN (R 4.3.0) #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.3.0) #> class 7.3-22 2023-05-03 [1] CRAN (R 4.3.3) #> cli * 3.6.2 2023-12-11 [1] CRAN (R 4.3.0) #> codetools 0.2-20 2024-03-31 [1] CRAN (R 4.3.2) #> colorspace 2.1-0 2023-01-23 [1] CRAN (R 4.3.0) #> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.3.0) #> Cubist * 0.4.2.1 2023-03-09 [1] CRAN (R 4.3.0) #> curl 5.2.1 2024-03-01 [1] CRAN (R 4.3.2) #> data.table 1.15.4 2024-03-30 [1] CRAN (R 4.3.2) #> dials * 1.2.1 2024-02-22 [1] CRAN (R 4.3.2) #> DiceDesign 1.10 2023-12-07 [1] CRAN (R 4.3.0) #> digest * 0.6.35 2024-03-11 [1] CRAN (R 4.3.2) #> distributional 0.4.0 2024-02-07 [1] CRAN (R 4.3.2) #> doParallel * 1.0.17 2022-02-07 [1] CRAN (R 4.3.0) #> dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.3.0) #> earth * 5.3.3 2024-02-26 [1] CRAN (R 4.3.2) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.3.0) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.1) #> fabletools 0.4.1 2024-03-02 [1] CRAN (R 4.3.2) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.3.0) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> feasts 0.3.2 2024-03-15 [1] CRAN (R 4.3.2) #> finnts * 0.4.0 2023-12-01 [1] CRAN (R 4.3.0) #> forcats * 1.0.0 2023-01-29 [1] CRAN (R 4.3.0) #> foreach * 1.5.2 2022-02-02 [1] CRAN (R 4.3.0) #> forecast * 8.22.0 2024-03-04 [1] CRAN (R 4.3.2) #> Formula * 1.2-5 2023-02-24 [1] CRAN (R 4.3.0) #> fracdiff 1.5-3 2024-02-01 [1] CRAN (R 4.3.2) #> fs * 1.6.3 2023-07-20 [1] CRAN (R 4.3.0) #> furrr 0.3.1 2022-08-15 [1] CRAN (R 4.3.0) #> future 1.33.2 2024-03-26 [1] CRAN (R 4.3.2) #> future.apply 1.11.2 2024-03-28 [1] CRAN (R 4.3.2) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0) #> ggplot2 * 3.5.0 2024-02-23 [1] CRAN (R 4.3.2) #> glmnet * 4.1-8 2023-08-22 [1] CRAN (R 4.3.0) #> globals 0.16.3 2024-03-08 [1] CRAN (R 4.3.2) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.3.0) #> gower 1.0.1 2022-12-22 [1] CRAN (R 4.3.0) #> GPfit 1.0-8 2019-02-08 [1] CRAN (R 4.3.0) #> gtable 0.3.4 2023-08-21 [1] CRAN (R 4.3.0) #> hardhat 1.3.1 2024-02-02 [1] CRAN (R 4.3.2) #> hms 1.1.3 2023-03-21 [1] CRAN (R 4.3.0) #> htmltools 0.5.8 2024-03-25 [1] CRAN (R 4.3.2) #> hts * 6.0.2 2021-05-30 [1] CRAN (R 4.3.0) #> ipred 0.9-14 2023-03-09 [1] CRAN (R 4.3.0) #> iterators * 1.0.14 2022-02-05 [1] CRAN (R 4.3.0) #> kernlab * 0.9-32 2023-01-31 [1] CRAN (R 4.3.0) #> knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) #> lattice * 0.22-6 2024-03-20 [1] CRAN (R 4.3.2) #> lava 1.8.0 2024-03-05 [1] CRAN (R 4.3.2) #> lhs 1.1.6 2022-12-17 [1] CRAN (R 4.3.0) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1) #> listenv 0.9.1 2024-01-29 [1] CRAN (R 4.3.2) #> lmtest 0.9-40 2022-03-21 [1] CRAN (R 4.3.0) #> lubridate * 1.9.3 2023-09-27 [1] CRAN (R 4.3.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> MASS 7.3-60.0.1 2024-01-13 [1] CRAN (R 4.3.3) #> Matrix * 1.6-5 2024-01-11 [1] CRAN (R 4.3.3) #> modeltime * 1.2.8 2023-09-02 [1] CRAN (R 4.3.0) #> munsell 0.5.1 2024-04-01 [1] CRAN (R 4.3.2) #> nlme 3.1-164 2023-11-27 [1] CRAN (R 4.3.3) #> nnet 7.3-19 2023-05-03 [1] CRAN (R 4.3.3) #> padr 0.6.2 2022-11-23 [1] CRAN (R 4.3.0) #> parallelly 1.37.1 2024-02-29 [1] CRAN (R 4.3.2) #> parsnip * 1.2.1 2024-03-22 [1] CRAN (R 4.3.2) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) #> plotmo * 3.6.3 2024-02-26 [1] CRAN (R 4.3.2) #> plotrix * 3.8-4 2023-11-10 [1] CRAN (R 4.3.1) #> plyr 1.8.9 2023-10-02 [1] CRAN (R 4.3.0) #> prodlim 2023.08.28 2023-08-28 [1] CRAN (R 4.3.0) #> purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.0) #> quadprog 1.5-8 2019-11-20 [1] CRAN (R 4.3.0) #> quantmod 0.4.26 2024-02-14 [1] CRAN (R 4.3.2) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.0) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.0) #> R.oo 1.26.0 2024-01-24 [1] CRAN (R 4.3.2) #> R.utils 2.12.3 2023-11-18 [1] CRAN (R 4.3.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.3.0) #> Rcpp 1.0.12 2024-01-09 [1] CRAN (R 4.3.2) #> RcppParallel 5.1.7 2023-02-27 [1] CRAN (R 4.3.0) #> readr * 2.1.5 2024-01-10 [1] CRAN (R 4.3.0) #> recipes * 1.0.10 2024-02-18 [1] CRAN (R 4.3.2) #> reprex 2.1.0 2024-01-11 [1] CRAN (R 4.3.0) #> reshape2 1.4.4 2020-04-09 [1] CRAN (R 4.3.0) #> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.3.0) #> rmarkdown 2.26 2024-03-05 [1] CRAN (R 4.3.2) #> rpart 4.1.23 2023-12-05 [1] CRAN (R 4.3.3) #> rsample 1.2.1 2024-03-25 [1] CRAN (R 4.3.2) #> rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.3.2) #> rules * 1.0.2 2023-03-08 [1] CRAN (R 4.3.0) #> scales * 1.3.0 2023-11-28 [1] CRAN (R 4.3.1) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) #> shape 1.4.6.1 2024-02-23 [1] CRAN (R 4.3.2) #> slider 0.3.1 2023-10-12 [1] CRAN (R 4.3.1) #> SparseM 1.81 2021-02-18 [1] CRAN (R 4.3.0) #> StanHeaders 2.32.6 2024-03-01 [1] CRAN (R 4.3.2) #> stringi 1.8.3 2023-12-11 [1] CRAN (R 4.3.1) #> stringr * 1.5.1 2023-11-14 [1] CRAN (R 4.3.1) #> styler 1.10.2 2023-08-29 [1] CRAN (R 4.3.0) #> survival 3.5-8 2024-02-14 [1] CRAN (R 4.3.3) #> tibble * 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) #> tidyr * 1.3.1 2024-01-24 [1] CRAN (R 4.3.2) #> tidyselect * 1.2.1 2024-03-11 [1] CRAN (R 4.3.2) #> tidyverse * 2.0.0 2023-02-22 [1] CRAN (R 4.3.0) #> timechange 0.3.0 2024-01-18 [1] CRAN (R 4.3.0) #> timeDate 4032.109 2023-12-14 [1] CRAN (R 4.3.1) #> timetk * 2.9.0 2023-10-31 [1] CRAN (R 4.3.0) #> tseries 0.10-55 2023-12-06 [1] CRAN (R 4.3.0) #> tsibble 1.1.4 2024-01-29 [1] CRAN (R 4.3.2) #> TTR 0.24.4 2023-11-28 [1] CRAN (R 4.3.1) #> tune * 1.2.0 2024-03-20 [1] CRAN (R 4.3.2) #> tzdb 0.4.0 2023-05-12 [1] CRAN (R 4.3.0) #> urca 1.3-3 2022-08-29 [1] CRAN (R 4.3.0) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.3.1) #> vroom * 1.6.5 2023-12-05 [1] CRAN (R 4.3.0) #> warp 0.2.1 2023-11-02 [1] CRAN (R 4.3.1) #> withr 3.0.0 2024-01-16 [1] CRAN (R 4.3.0) #> workflows * 1.1.4 2024-02-19 [1] CRAN (R 4.3.2) #> xfun 0.43 2024-03-25 [1] CRAN (R 4.3.2) #> xts 0.13.2 2024-01-21 [1] CRAN (R 4.3.0) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.3.1) #> yardstick 1.3.1 2024-03-21 [1] CRAN (R 4.3.3) #> zoo 1.8-12 2023-04-13 [1] CRAN (R 4.3.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
mitokic commented 3 months ago

Hey @AndrewKostandy, thanks for logging this. Let me take a deeper look and get back to you.