Extending methods for distributional accuracy scores?

cboettig commented 2 years ago

Amazing package here, thanks!

I really appreciate the design of extendable models which is so nicely described in your vignette.

It seems that it should also be possible to extend the set of scoring rules, in particular, for distributions, but I would find some guidance helpful. You are probably already familiar with the scoringRules package, https://cran.r-project.org/web/packages/scoringRules/index.html, which does a rather good job IMO of providing a good number of very computationally efficient implementations of scoring rules for distributions, and it would be lovely to see how we might best plug those existing methods into fabletools accuracy calculations.

In particular, I have found the strictly proper rule of Logarithmic score to be an appealing alternative to CRPS, especially under circumstances where underestimating the probability of a rare event can be catastrophic. I believe the implementations in scoringRules could potentially provide improved performance in scoring very large forecasts.

Thanks for considering!

mitchelloharawild commented 2 years ago

Amazing package here, thanks!

I really appreciate the design of extendable models which is so nicely described in your vignette.

Thanks! I really ought to write more vignettes.

It seems that it should also be possible to extend the set of scoring rules, in particular, for distributions, but I would find some guidance helpful. You are probably already familiar with the scoringRules package, https://cran.r-project.org/web/packages/scoringRules/index.html, which does a rather good job IMO of providing a good number of very computationally efficient implementations of scoring rules for distributions, and it would be lovely to see how we might best plug those existing methods into fabletools accuracy calculations.

The accuracy() function prepares inputs to measure functions. The relevant inputs to compute outputs for scoringRules::logs() would be .dist and .actual. So a suitable function might be:

log_score <- function(.dist, .actual, ...) {
  # fabletools:::require_package("scoringRules")

  # Assume .dist is Normal - this is relatively hard to check for at the moment
  par <- distributional::parameters(.dist)
  scoringRules::logs_norm(.actual, mean = par$mu, sd = par$sigma)
}

library(fable)
#> Loading required package: fabletools
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
z <- as_tsibble(USAccDeaths)
slice(z, 1:60) %>% 
  model(ETS(value)) %>% 
  forecast(h = "1 year") %>% 
  accuracy(z, measures = lst(log_score))
#> # A tibble: 12 × 3
#>    .model     .type log_score
#>    <chr>      <chr>     <dbl>
#>  1 ETS(value) Test       6.65
#>  2 ETS(value) Test       7.38
#>  3 ETS(value) Test       7.03
#>  4 ETS(value) Test       6.96
#>  5 ETS(value) Test       7.15
#>  6 ETS(value) Test       7.24
#>  7 ETS(value) Test       7.43
#>  8 ETS(value) Test       7.43
#>  9 ETS(value) Test       7.78
#> 10 ETS(value) Test       7.40
#> 11 ETS(value) Test       7.41
#> 12 ETS(value) Test       7.78

^{Created on 2021-11-06 by the reprex package (v2.0.0)}

In particular, I have found the strictly proper rule of Logarithmic score to be an appealing alternative to CRPS, especially under circumstances where underestimating the probability of a rare event can be catastrophic. I believe the implementations in scoringRules could potentially provide improved performance in scoring very large forecasts.

I hadn't realised that the scoringRules package had so many functions for CRPS/Log-score on each distribution class. I think it would be good for fabletools to Suggest this package and use it when calculating CRPS/Log-score of uncommon forecast distributions.

cboettig commented 2 years ago

@mitchelloharawild thanks again for this, it's proving very helpful.

You note above:

Assume .dist is Normal - this is relatively hard to check for at the moment

any advice on the best way to do this? We're looking to extend the above pattern to use other distributions from scoringRules.

cboettig commented 2 years ago

whoops, I see you have family() as a method now!

looks like we can do something like:

log_score <- function(.dist, .actual, ...) {
  fabletools:::require_package("scoringRules")
  fam <- family(.dist)
  par <- distributional::parameters(.dist)
  switch(fam) {
    normal = scoringRules::logs_norm(.actual, mean = par$mu, sd = par$sigma),
    lognormal = scoringRules::logs_lnorm(.actual, par$mu, par$sigma)
  }
}

(not sure if switch is the ideal choice here... does need a bit of manual mapping between family and variable names still but pretty slick!)

tidyverts / fabletools

Extending methods for distributional accuracy scores? #333