mdancho84 / modeltime-iterative-forecasting

3 stars 6 forks source link

Modeltime Plot for multiple series #4

Open rafzamb opened 3 years ago

rafzamb commented 3 years ago

When viewing multiple models over multiple series, the following problem occurs. When models are adjusted (or readjusted), in some cases (for example arima, ets, ..) they save details of the adjustment in the column “.model_desc”, this is the column that the plot_modeltime_forecast () function takes by default to generate the visualizations.

The problem is that, for example, if we have 10 series, the ARIMA model can appear to be 10 times for that series ID due to variations in the adjustment on the different series. Visually it is not user friendly or intuitive.

Here is an example with the same code provided by @mdancho84 .

library(tidymodels)
library(timetk)
library(modeltime)
library(tidyverse)

source("01_data_prep.R")
source("02_nested_modeltime.R")

# DATA PREP FUNCTIONS ----

nested_data_tbl <- walmart_sales_weekly %>%
    select(id, Date, Weekly_Sales) %>%
    set_names(c("id", "date", "value")) %>%

    extend_timeseries(
        .id_var     = id,
        .date_var   = date,
        .value      = value,
        .length_out = 52
    ) %>%

    # >> Can add xregs in here <<

    nest_timeseries(
        .id_var   = id,
        .date_var = date,
        .value    = value
    ) %>%

    split_nested_timeseries(
        .length_test = 52
    )

nested_data_tbl

# MODELING ----

# * XGBoost ----

rec_xgb <- recipe(value ~ ., training(nested_data_tbl$.splits[[1]])) %>%
    step_timeseries_signature(date) %>%
    step_rm(date) %>%
    step_zv(all_predictors()) %>%
    step_dummy(all_nominal_predictors(), one_hot = TRUE)

wflw_xgb <- workflow() %>%
    add_model(boost_tree("regression") %>% set_engine("xgboost")) %>%
    add_recipe(rec_xgb)

wflw_xgb

wflw_xgb_fit <- wflw_xgb %>% fit(training(nested_data_tbl$.splits[[1]]))

wflw_xgb_fit

# * ARIMA ----

recipe_arima <- recipe(value ~ ., training(nested_data_tbl$.splits[[1]]))

wflw_arima <- workflow() %>%
    add_model(arima_reg() %>% set_engine("auto_arima")) %>%
    add_recipe(recipe_arima)

wflw_arima

# * Bad Model ----
#   - Xgboost can't handle dates

recipe_bad <- recipe(value ~ ., training(nested_data_tbl$.splits[[1]]))

wflw_bad <- workflow() %>%
    add_model(boost_tree()) %>%
    add_recipe(recipe_bad )

wflw_bad

# * Prophet ----

wflw_prophet <- workflow() %>%
    add_model(prophet_reg(seasonality_yearly = TRUE)) %>%
    add_recipe(recipe_arima)

# NESTED WORKFLOW ----

# * Nested Modeltime Table
#   - Works with
nested_modeltime_tbl <- nested_data_tbl %>%
    modeltime_nested_fit(
        wflw_arima,
        wflw_xgb_fit, # FITTED WORKS
        wflw_bad,     # BAD MODEL RESULTS IN ERROR LOGGING
        wflw_prophet,
        control = control_nested_fit(verbose = TRUE)
    )

nested_modeltime_tbl$.modeltime_tables[[1]]

# * Attributes ----
#   - Logs key results: accuracy table, test forecast table
#   - Pushes expensive computations to modeling
#   - Speeds up evaluation

attributes(nested_modeltime_tbl)

nested_modeltime_tbl %>% modeltime_nested_accuracy()

nested_modeltime_tbl %>% modeltime_nested_error_report()

nested_modeltime_tbl %>%
    modeltime_nested_test_forecast() %>%
    group_by(id) %>%
    plot_modeltime_forecast(
        .facet_ncol = 2,
        .interactive = FALSE
    )

Rplot01

Do you see @mdancho84 @AlbertoAlmuinha this as an inconvenience or is it okay?

In this visualization it does not appear to be as severe but with more series and more models presenting many variations in the setting over multiple series, it is impossible to understand the visualization.

When I developed sknifedatar solvent this by duplicating the column ".model_desc" with another name and applying a with a regex on the original column ".model_desc", in this way when applying the function plot_modeltime_forecast (), I could see for example how my arima model appears only once in the visualization legend. If the user wants to see the detail of the adjustment, there is the column “.model_desc” that was duplicated. It was an artisanal solution that I implemented, but I don't know what they look like.

mdancho84 commented 3 years ago

Hey @rafzamb this is a good call. Maybe we should implement your method from sknifedatar to revert to a simple "ARIMA" for the model description.