mitchelloharawild / distributional

Vectorised distributions for R
https://pkg.mitchelloharawild.com/distributional
GNU General Public License v3.0
94 stars 15 forks source link

quantile() with multiple probabilities doesn't work for named distributions #79

Closed Bisaloo closed 1 year ago

Bisaloo commented 2 years ago

quantile() fails when trying to get quantiles from a fbl_ts column with distributions. I was initially thinking it was affecting all tibbles but it seems restricted to fbl_ts, as shown in the reprex below where it's working for tibble and tsibble

library(distributional)

# Works with plain tibble
df <- tibble::tibble(
  name = paste0("dist", 1:5), 
  date = seq(as.Date("2022-05-20"), as.Date("2022-05-24"), by = 1),
  dist = dist_normal(1:5,1)
)

df |> 
  dplyr::mutate(q = quantile(dist, c(0.1, 0.9)))
#> # A tibble: 5 × 4
#>   name  date          dist q        
#>   <chr> <date>      <dist> <list>   
#> 1 dist1 2022-05-20 N(1, 1) <dbl [2]>
#> 2 dist2 2022-05-21 N(2, 1) <dbl [2]>
#> 3 dist3 2022-05-22 N(3, 1) <dbl [2]>
#> 4 dist4 2022-05-23 N(4, 1) <dbl [2]>
#> 5 dist5 2022-05-24 N(5, 1) <dbl [2]>

# Works with tsibble
df |> 
  tsibble::as_tsibble(key = name, index = date) |> 
  dplyr::mutate(q = quantile(dist, c(0.1, 0.9)))
#> # A tsibble: 5 x 4 [1D]
#> # Key:       name [5]
#>   name  date          dist q        
#>   <chr> <date>      <dist> <list>   
#> 1 dist1 2022-05-20 N(1, 1) <dbl [2]>
#> 2 dist2 2022-05-21 N(2, 1) <dbl [2]>
#> 3 dist3 2022-05-22 N(3, 1) <dbl [2]>
#> 4 dist4 2022-05-23 N(4, 1) <dbl [2]>
#> 5 dist5 2022-05-24 N(5, 1) <dbl [2]>

# Doesn't work with fbl_ts
library(fable)
#> Loading required package: fabletools
library(tsibbledata)

aus_production |>
  dplyr::select(Quarter, Cement) |>
  model(stl = NAIVE(Cement)) |>
  forecast(h = "5 years") |>
  dplyr::mutate(quantiles = quantile(Cement, c(0.1, 0.9)))
#> Error in `dplyr::mutate()`:
#> ! Problem while computing `quantiles = quantile(Cement, c(0.1, 0.9))`.
#> Caused by error in `FUN()`:
#> ! attempt to set 'colnames' on an object with less than two dimensions

#> Backtrace:
#>      ▆
#>   1. ├─dplyr::mutate(...)
#>   2. ├─dplyr:::mutate.data.frame(...)
#>   3. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), caller_env = caller_env())
#>   4. │   ├─base::withCallingHandlers(...)
#>   5. │   └─mask$eval_all_mutate(quo)
#>   6. ├─stats::quantile(Cement, c(0.1, 0.9))
#>   7. ├─distributional:::quantile.distribution(Cement, c(0.1, 0.9))
#>   8. │ └─distributional:::dist_apply(x, quantile, p = p, ...)
#>   9. │   └─base::lapply(out, `colnames<-`, dn)
#>  10. │     └─base FUN(X[[i]], ...)
#>  11. │       └─base::stop("attempt to set 'colnames' on an object with less than two dimensions")
#>  12. └─base::.handleSimpleError(...)
#>  13.   └─dplyr h(simpleError(msg, call))
#>  14.     └─rlang::abort(...)

Created on 2022-05-25 by the reprex package (v2.0.1.9000)

The issue comes from

https://github.com/mitchelloharawild/distributional/blob/d04d5284b176b3ee72df3a6eb3a8fa996388d7e3/R/utils.R#L59

As shown in the 2nd reprex below, fbl_ts keep a dimname even after being pulled, which throws dist_apply() off:

library(fable)
#> Loading required package: fabletools
library(tsibbledata)

aus_production |>
  dplyr::select(Quarter, Cement) |>
  model(stl = NAIVE(Cement)) |>
  forecast(h = "5 years") |> 
  dplyr::pull(Cement) |> 
  dimnames()
#> [1] "Cement"

dplyr::starwars |> 
  dplyr::pull(height) |> 
  dimnames()
#> NULL

Created on 2022-05-25 by the reprex package (v2.0.1.9000)

Therefore, commenting

https://github.com/mitchelloharawild/distributional/blob/d04d5284b176b3ee72df3a6eb3a8fa996388d7e3/R/utils.R#L59

solves this specific issue but it probably causes problems elsewhere.

mitchelloharawild commented 1 year ago

MRE:

library(distributional)
x <- y <- dist_normal()
quantile(x, c(0.05, 0.95))
#> [[1]]
#> [1] -1.644854  1.644854
dimnames(y) <- "response"
quantile(y, c(0.05, 0.95))
#> Error in FUN(X[[i]], ...): attempt to set 'colnames' on an object with less than two dimensions

Created on 2022-09-02 by the reprex package (v2.0.1)

mitchelloharawild commented 1 year ago

Fixed, thanks!

library(distributional)
x <- y <- dist_normal()
quantile(x, c(0.05, 0.95))
#> [[1]]
#> [1] -1.644854  1.644854
dimnames(y) <- "response"
quantile(y, c(0.05, 0.95))
#> [[1]]
#> [1] -1.644854  1.644854

x <- y <- dist_multivariate_normal(mu = list(c(1,2)), 
                                   sigma = list(matrix(c(4,2,2,3), ncol=2)))
quantile(x, c(0.05, 0.95))
#> [[1]]
#>           [,1]       [,2]
#> [1,] -2.289707 -0.8489701
#> [2,]  4.289707  4.8489701
dimnames(y) <- c("a", "b")
quantile(y, c(0.05, 0.95))
#> [[1]]
#>              a          b
#> [1,] -2.289707 -0.8489701
#> [2,]  4.289707  4.8489701

Created on 2022-09-02 by the reprex package (v2.0.1)