tidyverts / fabletools

General fable features useful for extension packages
http://fabletools.tidyverts.org/
89 stars 31 forks source link

Modeling an empty tsibble (with key) should not error #313

Open Fuco1 opened 3 years ago

Fuco1 commented 3 years ago

In batch processing consistency is king. Sometimes we might apply filter on a dataset and remove all the data. Still calling model on such a tsibble should, in my opinion, not produce an error but return an empty mable. This allows us to write simple and consistent pipelines without unnecessary error checking.

This works:

> tsibble(i = 1:12, quantity = 1, index = "i") %>% filter(i > 1) %>% model(x = MEAN(quantity))
# A mable: 1 x 1
        x
  <model>
1  <MEAN>
> tsibble(i = 1:12, quantity = 1, index = "i") %>% filter(i > 12) %>% model(x = MEAN(quantity))
# A mable: 1 x 1
             x
       <model>
1 <NULL model>
Warning message:
1 error encountered for x
[1] All observations are missing, a model cannot be estimated without data.

but having a tsibble with key fails:

tsibble(i = 1:12, k = 1, quantity = 1, index = "i", key = "k") %>% filter(i > 12) %>% model(x = MEAN(quantity))
Error in `[[.default`(mdl, 1) : subscript out of bounds
> 
davidtedfordholt commented 3 years ago

With the current release versions of tsibble, feasts, fable and fabletools, I am getting the error for both the second and third lines above, rather than a <NULL model> for the second.

> tsibble(i = 1:12, quantity = 1, index = "i") %>% filter(i > 12) %>% model(x = MEAN(quantity))
Error in .data[[index_var(.data)]][[1]] : subscript out of bounds

Perhaps a recent change to fabletools is keeping the second example from getting through to the fable::MEAN(), which creates the <NULL model> in response to:

if (all(is.na(y))) {
    abort("All observations are missing, a model cannot be estimated without data.")  
}

It seems like fabletools::estimate() might be a decent place for a check that calls null_model() in the face of no data, as I cannot conceive of a model that will fit without data, but my lack of imagination might be showing itself.

mitchelloharawild commented 7 months ago

This should return a zero-row mable, but would require some restructuring of how and where some attributes like response variable are stored. Currently the response variables are grabbed from the estimated models, but in a zero-row mable there are no models to grab from.