mitchelloharawild / distributional

Vectorised distributions for R
https://pkg.mitchelloharawild.com/distributional
GNU General Public License v3.0
94 stars 15 forks source link

Add truncation #40

Closed robjhyndman closed 3 years ago

robjhyndman commented 3 years ago

A common thing to do is have a probability distribution that is truncated at zero. e.g., sampling from N(2,1) with a truncation at zero means that drawing values from N(2,1) with any values less than 0 thrown away and redrawn.

It would be nice to have a dist_truncated(distribution, threshold) function to handle this situation.

mitchelloharawild commented 3 years ago

Is dist_truncated() what you're looking for? Looks like I've forgotten to include it in the pkgdown reference.

library(distributional)
library(ggplot2)
library(tibble)
library(ggdist)

tribble(
  ~ Name, ~ Distribution,
  "Normal", dist_normal(2,1),
  "Truncated", dist_truncated(dist_normal(2,1), lower = 0)
) %>% 
  ggplot(aes(y = Name, dist = Distribution)) + 
  stat_dist_halfeye()

Created on 2020-08-04 by the reprex package (v0.3.0)

robjhyndman commented 3 years ago

Perfect. Yes, I looked for it but didn't find it.

mitchelloharawild commented 3 years ago

I've been thinking about how to introduce the distributional package and its advantages, and perhaps the distribution modifiers could be a useful demonstration of the package.

Is it reasonable to produce forecasts from a model, and then truncate the distributions to be within a reasonable bound? I presume it is better to use a model that gives distributions on the R+ domain, but does the following workflow make sense?

library(fable)
library(dplyr)
library(distributional)
eggs <- as_tsibble(fma::eggs) 
eggs %>% 
  model(ARIMA(value)) %>% 
  forecast(h = 10) %>% 
  # Truncate the forecast distribution at 0
  mutate(value = dist_truncated(value, lower = 0)) %>% 
  autoplot(eggs, point_forecast = lst(mean, median))

Created on 2020-08-04 by the reprex package (v0.3.0)

Edit: the example data doesn't illustrate this well, but this issue with the increasing point forecasts will probably occur with any data due to increasing distributional variance.

robjhyndman commented 3 years ago

Yes, that is one way of dealing with the R+ domain and has some advantages over the transformation approach. However, it will tend to bias the forecasts a little. Even the median here is biased upwards due to the truncation. The approach works best when the truncation is not too heavy (i.e., the probability of the original forecast distributions being below the threshold is relatively small).

mitchelloharawild commented 3 years ago

When would you prefer to use truncation over transformation?

I expect that the truncation won't skew the forecast distribution as much, and potentially give more reasonable upper bounds and forecast means?

robjhyndman commented 3 years ago

I'm not sure. Yes, the truncation will give more reasonable upper bounds, but the lower and central part of the distribution may be better with transformation.