quantile works incorrectly with negation of non-normal distributions

mitchelloharawild / distributional

Vectorised distributions for R

https://pkg.mitchelloharawild.com/distributional

GNU General Public License v3.0

94 stars 15 forks source link

quantile works incorrectly with negation of non-normal distributions #100

Closed venpopov closed 3 months ago

venpopov commented 3 months ago

Just found out that if we have a transformed distribution by negation, the quantile function gives incorrect results:

library(distributional)
d <- -dist_gumbel(0, 1)
quantile(d, 0)
#> [1] Inf
quantile(d, 0.1)
#> [1] 0.8340324
quantile(d, 0.2)
#> [1] 0.475885
quantile(d, 1)
#> [1] -Inf

^{Created on 2024-04-01 with reprex v2.1.0}

This is unrelated to #96 - it would have happened before with a transformation such as 1-dist_gumbel(0,1)

I picked up the gumbel distribution on purpose, because this is a common transformation - gumbel comes in two flavors, gumbel max and gumbel min, which are just with inversed input values. But this happens for any distribution on which the negative operation is applied.

This is due to how the quantile.dist_transformed function is implemented:

quantile.dist_transformed <- function(x, p, ...){
  x[["transform"]](quantile(x[["dist"]], p, ...))
}

It's also a problem for the cdf, which is also inverted.

venpopov commented 3 months ago

For the quantile function this is one possible easy and quick solution:

#' @export
quantile.dist_transformed <- function(x, p, ...){
  q1 <- x[["transform"]](quantile(x[["dist"]], p, ...))
  q2 <- x[["transform"]](quantile(x[["dist"]], 1 - p, ...))
  ifelse((q1 > q2 & p > 0.5) | (q1 < q2 & p < 0.5), q1, q2)
}

venpopov commented 3 months ago

Adding a similar check to the cdf function also fixes things:

cdf.dist_transformed <- function(x, q, ...){
  inv <- function(v) SW(x[["inverse"]](v))
  p1 <- cdf(x[["dist"]], inv(q), ...)
  p2 <- cdf(x[['dist']], inv(q+1), ...)
  p <- ifelse(p1 <= p2, p1, 1-p1)

  limits <- field(support(x), "lim")[[1]]
  if (!any(is.na(limits))) {
    p[q <= limits[1]] <- 0
    p[q >= limits[2]] <- 1
  }
  p
}

The logic is basically that the cdf should increase monotonically. If it does not, then we need to take the 1-cdf.

Let me know what you think about this approach

mitchelloharawild commented 3 months ago

Transformations are assumed (but not tested) to be monotonic (#7), and I suppose the current implementation also assumes that they are monotonically increasing over the distribution's domain.

Rather than calculating the quantile, cdf, etc. multiple times I think it is better (more efficient) to instead test if the transformation is increasing or decreasing over the support of the underlying distribution. I'll see if I can fix this one.

mitchelloharawild commented 3 months ago

Thanks, should be fixed now. Check out the changes and let me know what you think. I realise that the implementation isn't vectorised, and will need further work after #25.

library(distributional)
d <- -dist_gumbel(0, 1)
quantile(d, 0)
#> [1] -Inf
quantile(d, 0.1)
#> [1] -2.250367
quantile(d, 0.2)
#> [1] -1.49994
quantile(d, 1)
#> [1] Inf

^{Created on 2024-04-05 with reprex v2.0.2}

venpopov commented 3 months ago

Yup, I can confirm it is now correct:

library(distributional)
d <- dist_gumbel(0,1)
x <- seq(-4,4, 0.001)
plot(x, cdf(d, x)[[1]], type = "l")
lines(x, cdf(-d, x)[[1]], col='red')


cdf_gumbel_max <- function(x) {
  exp(-exp(-x))
}

cdf_gumbel_min <- function(x) {
  1 - exp(-exp(x))
}

plot(x, cdf_gumbel_max(x), type = "l")
lines(x, cdf_gumbel_min(x), col='red')

^{Created on 2024-04-05 with reprex v2.1.0}

Thanks!

venpopov commented 3 months ago

Though it looks like it caused some checks to fail on master, you probably need to take a look at that

mitchelloharawild commented 3 months ago

Thanks, the failing test was from an invalid distribution. We might like to calculate and store the transformation's monotonic nature (increasing/decreasing) when the transformation is applied, and in doing so raise an error if the transformation is invalid (e.g. log on distributions with domain <0).