tidyverts / fabletools

General fable features useful for extension packages
http://fabletools.tidyverts.org/
89 stars 31 forks source link

Adding new lst_mdl to existing mable produces unexpected behavior #402

Open vincentmgeels opened 2 months ago

vincentmgeels commented 2 months ago

Apologies if the title is somewhat unclear; I wasn't quite sure how to phrase the issue, but hopefully the example below is clear.

I'm attempting to add a new lst_mdl to an existing mable object programatically. The following attempt stems from trying to define a number of formulas dynamically and then fit each of them in turn in the fable framework within some user-defined function--in other words, without the ability to use $:

#define some formulas
fmlas <- c("sqrt(Trips) ~ fourier(period = 4L, K = 1)"
           , "log(Trips+1) ~ fourier(period = 4L, K = 1)")

#build models for each element of fmlas using the same dataset
model1 <- tourism %>%
  dplyr::filter(Region == "Melbourne"
         , Purpose == "Business") %>% 
  fabletools::model(
    ARIMA(as.formula(fmlas[1]))
  )

model2 <- tourism %>%
  dplyr::filter(Region == "Melbourne"
                , Purpose == "Business") %>% 
  fabletools::model(
    ARIMA(as.formula(fmlas[2]))
  )

tmp <- model1
#add mable from model2 to tmp
tmp[[names(model2)[length(names(model2))]]] <- model2[[names(model2)[length(names(model2))]]]

This produces an output I wasn't expecting (is it because the original tmp object only contains the model name stored in model1? I had trouble understanding the as_mable documentation and am not sure if it's relevant here, particularly the model argument to this function):

glance(tmp)
# A tibble: 1 × 12
  Region    State    Purpose  `ARIMA(as.formula(fmlas[2]))` .model                      sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
  <chr>     <chr>    <chr>                          <model> <chr>                        <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
1 Melbourne Victoria Business   <LM w/ ARIMA(0,1,1) errors> ARIMA(as.formula(fmlas[1]))   1.91   -136.  281.  281.  290. <cpl [0]> <cpl [1]>

Compare this to building both models within the same call to fabletools::model, and then calling glance on the result:

#build both models
models <- tourism %>%
  dplyr::filter(Region == "Melbourne"
         , Purpose == "Business") %>% 
  fabletools::model(
    ARIMA(as.formula(fmlas[1]))
, ARIMA(as.formula(fmlas[2]))
  )
glance(models3)
# A tibble: 2 × 11
  Region    State    Purpose  .model                      sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
  <chr>     <chr>    <chr>    <chr>                        <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
1 Melbourne Victoria Business ARIMA(as.formula(fmlas[1])) 1.91    -136.  281.  281.  290.  <cpl [0]> <cpl [1]>
2 Melbourne Victoria Business ARIMA(as.formula(fmlas[2])) 0.0168    50.4 -92.9 -92.3 -83.4 <cpl [0]> <cpl [1]>

As I mentioned above, I'm trying to build models dynamically without resorting to manually writing out each desired model within fabletools::model, so if there's an alternative approach to achieve this idea I'd be interested in learning about it!

vincentmgeels commented 2 months ago

I believe I've found a workable solution:

While the use of [[ fails in the first code block in my original post, you can append a new lst_mdl to an existing mable using the $ operator on both sides of an assignment. Referring back to model1 from the first code block above:

tmp2 <- model1
tmp2$`ARIMA(as.formula(fmlas[2]))` <- model2$`ARIMA(as.formula(fmlas[2]))`

We get a useful result in this case:

glance(tmp2)
# A tibble: 2 × 11
  Region    State    Purpose  .model                      sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
  <chr>     <chr>    <chr>    <chr>                        <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
1 Melbourne Victoria Business ARIMA(as.formula(fmlas[1])) 1.91    -136.  281.  281.  290.  <cpl [0]> <cpl [1]>
2 Melbourne Victoria Business ARIMA(as.formula(fmlas[2])) 0.0168    50.4 -92.9 -92.3 -83.4 <cpl [0]> <cpl [1]>

Inspecting attributes(tmp2) shows that $model contains names for both these models:

$model
[1] "ARIMA(as.formula(fmlas[1]))" "ARIMA(as.formula(fmlas[2]))"

Whereas attributes(tmp) shows only the name from model1 under $model:

$model
[1] "ARIMA(as.formula(fmlas[1]))"

If we update this attributes element for tmp, we then get the same intended behavior as with tmp2 or models3 when calling methods:

attributes(tmp)$model <- c(attributes(tmp)$model, names(tmp)[length(tmp)])

Now inspecting $model in attributes(tmp) again:

$model
[1] "ARIMA(as.formula(fmlas[1]))" "ARIMA(as.formula(fmlas[2]))"

Obtaining a summary for both models in tmp now works:

glance(tmp)
# A tibble: 2 × 11
  Region    State    Purpose  .model                      sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
  <chr>     <chr>    <chr>    <chr>                        <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
1 Melbourne Victoria Business ARIMA(as.formula(fmlas[1])) 1.91    -136.  281.  281.  290.  <cpl [0]> <cpl [1]>
2 Melbourne Victoria Business ARIMA(as.formula(fmlas[2])) 0.0168    50.4 -92.9 -92.3 -83.4 <cpl [0]> <cpl [1]>