tlverse / sl3

💪 🤔 Modern Super Learning with Machine Learning Pipelines
https://tlverse.org/sl3/
GNU General Public License v3.0
101 stars 40 forks source link

time series sl3 r and rolling cross validation #248

Open Shafi2016 opened 5 years ago

Shafi2016 commented 5 years ago

I want to apply time series rolling/cross validation. Though the data(washb_data) used below is not the times series. I am just assuming it as time series. so that we can make it reproducible and I shall be able to apply on my time series data. I am error getting same error with my actual time series data as well. I have added one line code from your time series

folds = origami::make_folds(washb_data, fold_fun=folds_rolling_window, window_size = 50, validation_size = 30, gap = 0, batch = 50) Howver, when I reached sl_fit <- sl$train(washb_task). I get the following error. I don't know to fix it.

Error in set(private$.data, j = new_col_names, value = new_data) : Supplied 570 items to be assigned to 1000 items of column 'd47fdc00-01a0-11ea-a044-4560ff6b69d1_Pipeline(Lrnr_pkg_SuperLearner_screener_screen.corP->Stack)_Lrnr_glm_TRUE'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code

The rest are your codes library(data.table) library(knitr) library(kableExtra) library(tidyverse) library(origami) library(SuperLearner) library(sl3)

set.seed(7194)

load data set and take a peek

washb_data <- fread("https://raw.githubusercontent.com/tlverse/tlverse-data/master/wash-benefits/washb_data.csv", stringsAsFactors = TRUE)

washb_data <- washb_data[1:1000 ,] head(washb_data) %>% kable(digits = 4) %>% kableExtra:::kable_styling(fixed_thead = T) %>% scroll_box(width = "100%", height = "300px")

specify the outcome and covariates

outcome <- "whz" covars <- colnames(washb_data)[-which(names(washb_data) == outcome)] folds = origami::make_folds(washb_data, fold_fun=folds_rolling_window, window_size = 50, validation_size = 30, gap = 0, batch = 50)

create the sl3 task

washb_task <- make_sl3_Task( data = washb_data, covariates = covars, outcome = outcome, folds = folds )

choose base learners

lrnr_glm <- make_learner(Lrnr_glm) lrnr_mean <- make_learner(Lrnr_mean) lrnr_glmnet <- make_learner(Lrnr_glmnet)

lrnr_ranger100 <- make_learner(Lrnr_ranger, num.trees = 100) lrnr_hal_simple <- make_learner(Lrnr_hal9001, degrees = 1, n_folds = folds) lrnr_gam <- Lrnr_pkg_SuperLearner$new("SL.gam") lrnr_bayesglm <- Lrnr_pkg_SuperLearner$new("SL.bayesglm")

stack <- make_learner( Stack, lrnr_glm, lrnr_mean, lrnr_ranger100, lrnr_glmnet, lrnr_gam, lrnr_bayesglm ) metalearner <- make_learner(Lrnr_nnls) screen_cor <- Lrnr_pkg_SuperLearner_screener$new("screen.corP")

which covariates are selected on the full data?

screen_cor$train(washb_task) cor_pipeline <- make_learner(Pipeline, screen_cor, stack) fancy_stack <- make_learner(Stack, cor_pipeline, stack)

we can visualize the stack

dt_stack <- delayed_learner_train(fancy_stack, washb_task) plot(dt_stack, color = FALSE, height = "400px", width = "100%") sl <- make_learner(Lrnr_sl, learners = fancy_stack, metalearner = metalearner )

we can visualize the super learner

dt_sl <- delayed_learner_train(sl, washb_task) plot(dt_sl, color = FALSE, height = "400px", width = "100%")

sl_fit <- sl$train(washb_task) sl_preds <- sl_fit$predict() head(sl_preds)

Shafi2016 commented 5 years ago

I get the same problem even with this sample codes of https://github.com/tlverse/sl3_lecture/blob/master/sl3_timeseries.Rmd

library(data.table) library(origami) library(sl3) library(xts)

load data

data(bsds)

head(bsds)

Create a time-series object:

tsdata<-xts(bsds$cnt, order.by=as.POSIXct(bsds$dteday))

Visualize the time-series:

PerformanceAnalytics::chart.TimeSeries(tsdata, auto.grid = FALSE, main = "Count of total rental bikes")

Final setup

folds = origami::make_folds(tsdata, fold_fun=folds_rolling_window, window_size = 50, validation_size = 30, gap = 0, batch = 50)

covars <- "cnt"

outcome <- "cnt"

create the sl3 task and take a look at it

ts_uni_task <- sl3_Task$new(data = bsds, covariates = covars,

                        outcome = outcome, outcome_type = "continuous", folds=folds)

let's take a look at the sl3 task

n_ahead_param <- 2 lrnr_arima <- Lrnr_arima$new(n.ahead = n_ahead_param) fit_arima <- lrnr_arima$train(ts_uni_task)

verify that the learner is fit

fit_arima$is_trained pred_arima <- fit_arima$predict()

head(pred_arima) lrnr_tsdyn_linear <- Lrnr_tsDyn$new(learner = "linear", m = 1,

                                n.ahead = n_ahead_param)

lrnr_tsdyn_setar <- Lrnr_tsDyn$new(learner = "setar", m = 1, model = "TAR",

                               n.ahead = n_ahead_param)

lrnr_tsdyn_lstar <- Lrnr_tsDyn$new(learner = "lstar", m = 1,

                               n.ahead = n_ahead_param)

lrnr_garch <- Lrnr_rugarch$new(n.ahead = n_ahead_param)

lrnr_expsmooth <- Lrnr_expSmooth$new(n.ahead = n_ahead_param)

lrnr_harmonicreg <- Lrnr_HarmonicReg$new(n.ahead = n_ahead_param, K = 7,

                                     freq = 105)

ts_stack <- Stack$new(lrnr_arima, lrnr_tsdyn_linear, lrnr_tsdyn_setar,

                  lrnr_tsdyn_lstar)

ts_stack_fit <- ts_stack$train(ts_uni_task)

ts_stack_preds <- ts_stack_fit$predict() Error in set(learner_preds, j = current_names, value = current_preds) : Supplied 2 items to be assigned to 731 items of column 'Lrnr_arima_NULL_2'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code. Failed on predict Error in self$compute_step() : Error in set(learner_preds, j = current_names, value = current_preds) : Supplied 2 items to be assigned to 731 items of column 'Lrnr_arima_NULL_2'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.

jeremyrcoyle commented 5 years ago

There seems to be a recent bug in sl3 that prevents time series super learner from working correctly. Thanks for reporting this. We'll get it fixed ASAP

Shafi2016 commented 5 years ago

Thank you so much!! I shall be desperately waiting for the new update on it. The problem seems to be related to data.table.

Shafi2016 commented 4 years ago

Do you have any update on the above-mentioned problem?

imalenica commented 4 years ago

Hi- sorry for the delay. I was able to fix it, and will be pushing the updated version in the next few days (I need to check other CVs as well).

Shafi2016 commented 4 years ago

Hello Ivana Malenica, Thanks alot! This is a great news. I hope we will get updated version soon.

jeremyrcoyle commented 4 years ago

This should now be fixed on devel. You can install the devel version by doing install_github("tlverse/sl3@devel"). It will be merged up to master shortly.

Shafi2016 commented 4 years ago

First of all, I removed old version of sl3 and reinstall it using the link you provided. I checked again using the my own data/codes and this example https://github.com/tlverse/sl3_lecture/blob/master/sl3_timeseries.Rmd. When I reached to this line of codes ts_stack_preds <- ts_stack_fit$predict().

I still get the same problem. Am I making any mistake.?

Thanks in Advance.

Error in set(learner_preds, j = current_names, value = current_preds) : Supplied 2 items to be assigned to 731 items of column 'Lrnr_arima_NULL_2'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code. Failed on predict Error in self$compute_step() : Error in set(learner_preds, j = current_names, value = current_preds) : Supplied 2 items to be assigned to 731 items of column 'Lrnr_arima_NULL_2'. If you wish to 'recycle' the RHS please use rep() to make this intent clear to readers of your code.