mitchelloharawild / distributional

Vectorised distributions for R
https://pkg.mitchelloharawild.com/distributional
GNU General Public License v3.0
94 stars 15 forks source link

Implement sum.hilo() #35

Closed mitchelloharawild closed 4 years ago

mitchelloharawild commented 4 years ago

Currently this gives correct output with incorrect length:

library(distributional)
sum(hilo(c(dist_normal(1,4), dist_normal(2,2)), 95))
#> <hilo[1]>
#> [1] [-8.759784, 14.75978]95 [-8.759784, 14.75978]95

Created on 2020-07-03 by the reprex package (v0.3.0)

mitchelloharawild commented 4 years ago

Perhaps intervals shouldn't be added in the first place, and instead give an error suggesting that the distributions need to be added.

Relevant discussion here: https://stackoverflow.com/questions/62692429/aggregating-forecasts-using-fable/62707108#62707108

rando-brando commented 1 year ago

@mitchelloharawild Is it possible to sum t(N) distributions? When I tweak the example in the stack overflow post with a transformation such as log the example no longer works. I'm guessing its a simple fix but I cannot figure it out so thought I would ask.

library(fable)
library(dplyr)
lung_deaths_agg <- as_tsibble(cbind(mdeaths, fdeaths))

fc_1 <- lung_deaths_agg %>% 
  model(lm = fable::TSLM(log(value) ~ trend() + season())) %>% 
  forecast()
fc_1 %>% 
  summarise(value = sum(value), .mean = mean(value))
Caused by error in `Ops.dist_transformed()`:
! The + operation is not supported for <dist_transformed> and <dist_transformed>
mitchelloharawild commented 1 year ago

I don't think it's trivial to sum transformed distributions, at least it's not as easy as it is for normal distributions. A fallback sum method for distributions could be added, but would likely be slow and inexact.

You can however easily sum sample distributions, so you might like to try bootstrap sampling your forecasts instead.

library(fable)
#> Loading required package: fabletools
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
lung_deaths_agg <- as_tsibble(cbind(mdeaths, fdeaths))

fc_1 <- lung_deaths_agg %>% 
  model(lm = fable::TSLM(log(value) ~ trend() + season())) %>% 
  forecast(bootstrap = TRUE)
fc_1 %>% 
  summarise(value = sum(value), .mean = mean(value))
#> # A fable: 24 x 3 [1M]
#>       index        value .mean
#>       <mth>       <dist> <dbl>
#>  1 1980 Jan sample[5000] 2666.
#>  2 1980 Feb sample[5000] 2571.
#>  3 1980 Mar sample[5000] 2468.
#>  4 1980 Apr sample[5000] 2035.
#>  5 1980 May sample[5000] 1624.
#>  6 1980 Jun sample[5000] 1451.
#>  7 1980 Jul sample[5000] 1396.
#>  8 1980 Aug sample[5000] 1267.
#>  9 1980 Sep sample[5000] 1258.
#> 10 1980 Oct sample[5000] 1515.
#> # … with 14 more rows

Created on 2022-11-12 by the reprex package (v2.0.1)