Closed AlbertoAlmuinha closed 3 years ago
Ok, I will test it out and report back. Thanks for this!
This one was easy. Done.
@AlbertoAlmuinha I had to revert to the previous nest_timeseries() function because the future forecast was resulting in an error. I still need to do some debugging, but I wanted to have working code the review.
Ok, I can take a look at this to see what is happening!
I'll look into it too this weekend. Keep me posted if you find anything. I was thinking of making yours a nest_timeseries2() and comparing differences in the resulting objects with the waldo
package. https://www.tidyverse.org/blog/2020/10/waldo/
I didn't know that package, I think it's a good idea, it can make my task much easier. If I find out anything I will let you know.
That's it, I've fixed the problem. That package is a real JEWEL! Thanks for recommending it to me so I can add it to my artillery. The problem is that it was adding an additional "n" column in actual with a fixed number which was the count of the data, simply adding a select(-n) both nested_df are now exactly the same and I have run all the code and have not had any problems. This would be the new version:
nest_timeseries <- function(.data, .id_var, .length_out) {
id_var_expr <- enquo(.id_var)
# SPLIT FUTURE AND ACTUAL DATA
future_data_tbl <- .data %>%
panel_tail(id = !!id_var_expr, n = .length_out)
groups <- future_data_tbl$id %>% unique() %>% length()
n_group <- .data %>% group_by(!!id_var_expr) %>% summarise(n = n() - (dim(future_data_tbl)[1]/groups))
actual_data_tbl <- .data %>%
inner_join(n_group, by = rlang::quo_name(id_var_expr)) %>%
group_by(!!id_var_expr) %>%
slice(seq(first(n))) %>%
ungroup() %>%
select(-n)
# CHECKS
if (nrow(future_data_tbl) == 0) {
rlang::warn("Future Data is `NULL`. Try using `extend_timeseries()` to add future data.")
}
# NEST
ret_1 <- actual_data_tbl %>%
nest(.actual_data = - (!! id_var_expr))
ret_2 <- future_data_tbl %>%
nest(.future_data = - (!! id_var_expr))
# JOIN
id_col_text <- names(ret_1)[[1]]
ret <- left_join(ret_1, ret_2, by = id_col_text)
return(ret)
}
Oh, wow. That was quick. I will give it a try shortly.
It works now. Thanks!
Hi @mdancho84 ,
The first step of data preparation is carried out by three functions. In the first one,
extend_timeseries
, a check is made on the existence of missing values and an error is thrown in case they exist because this field is used to make the subsequent filtering. I think that it would be interesting to change this error by a warning and to launch to the user the suggestion to perhaps impute them later in the workflow with recipes.I have modified the function
nest_timeseries
so that the separation is not made based on the missing values and in this way it is not required to the user its imputation in a previous stage, so that in this way if it wishes it can impute them later with recipes.This would be the function modified: