mitchelloharawild / distributional

Vectorised distributions for R
https://pkg.mitchelloharawild.com/distributional
GNU General Public License v3.0
94 stars 15 forks source link

Add method to define region of support #8

Closed mitchelloharawild closed 2 years ago

mitchelloharawild commented 4 years ago

This can identify distributions as continuous, discrete, integer valued, non-negative, etc.

mitchelloharawild commented 4 years ago

Some inspiration can be gleaned from distributions3::support()

mjskay commented 2 years ago

Internally in ggdist I've needed to implement a solution to this issue, the experience of which might be helpful here:

Is there any interest in an is-this-dist-discrete function in {distributional}? Maybe an is_discrete() or is_discrete_dist()? Obviously in {distributional} the kind of hack I implemented above would only be necessary for dist_wrap(), as other distributions can simply have implementations that return TRUE/FALSE. If you're interested and have a preferred function name I'd be happy to make a PR.

mitchelloharawild commented 2 years ago

I definitely think that a function to describe the distribution type is required. I do however think that the function can be more generalised than determining if a distribution is discrete or not.

If this is important for {ggdist} I can start working on this for inclusion in the upcoming release. One idea I had was having support() return the prototype class expected for that distribution (with a class for formatting, and attributes for ranges / iterators if needed). Then you could use something like ggplot2:::is.discrete(support(<dist>)) to identify if the region of support is discrete or not.

mjskay commented 2 years ago

Ah cool yeah, I like that idea.

Also, this isn't high priority for ggdist since I have a working solution internally atm, but would be lovely to have eventually :)

mitchelloharawild commented 2 years ago

I've made a start to support() with a default method which uses your random draw hack. I'd still like to add more classes for formatting, and attributes for ranges/iterators, but it's a start.

library(distributional)
support(dist_normal(1:3))
#> Warning in set.seed(seed): '.Random.seed' is not an integer vector but of type
#> 'NULL', so ignored

#> Warning in set.seed(seed): '.Random.seed' is not an integer vector but of type
#> 'NULL', so ignored
#> [[1]]
#> numeric(0)
#> 
#> [[2]]
#> numeric(0)
#> 
#> [[3]]
#> numeric(0)
support(dist_bernoulli(0.4))
#> Warning in set.seed(seed): '.Random.seed' is not an integer vector but of type
#> 'NULL', so ignored
#> [[1]]
#> integer(0)
support(dist_multivariate_normal(mu = list(c(1,2)), sigma = list(matrix(c(4,2,2,3), ncol=2))))
#> Warning in set.seed(seed): '.Random.seed' is not an integer vector but of type
#> 'NULL', so ignored
#> [[1]]
#>      [,1] [,2]
support(dist_categorical(c(0.4, 0.6), outcomes = c("A", "B")))
#> Warning in set.seed(seed): '.Random.seed' is not an integer vector but of type
#> 'NULL', so ignored

#> Warning in set.seed(seed): '.Random.seed' is not an integer vector but of type
#> 'NULL', so ignored
#> [[1]]
#> character(0)
#> 
#> [[2]]
#> character(0)

Created on 2021-11-03 by the reprex package (v2.0.0)

mitchelloharawild commented 2 years ago

A little progress update on the formatting of these regions.

library(distributional)
library(tidyverse)
tibble(
  dist = c(
    dist_normal(1:3),
    dist_bernoulli(0.4), 
    dist_multivariate_normal(mu = list(c(1,2)), sigma = list(matrix(c(4,2,2,3), ncol=2))), 
    dist_categorical(list(c(0.4, 0.6)), outcomes = list(c("A", "B"))))
) %>% 
  mutate(support(dist))
#> # A tibble: 6 × 2
#>             dist `support(dist)`
#>           <dist>      <spprt_rg>
#> 1        N(1, 1)               R
#> 2        N(2, 1)               R
#> 3        N(3, 1)               R
#> 4 Bernoulli(0.4)             lgl
#> 5         MVN[2]             R^2
#> 6 Categorical[2]             chr

Created on 2021-11-03 by the reprex package (v2.0.0)

mjskay commented 2 years ago

Nice!!

mitchelloharawild commented 2 years ago

support() now also accepts the limits of the distribution, with updated formatting method. The type (class) of the distribution's support can be obtained with field(<support_region>, "x") and the limits with field(<support_region>, "lim"). User facing accessor functions can be added if needed.

I've closed the issue as I think this covers the scope I had in mind for support(), if there is anything else needed let me know :+1:

library(distributional)
library(tidyverse)
tibble(
  dist = c(
    dist_normal(0,1),
    dist_truncated(dist_normal(0,1), lower = 5),
    dist_poisson(1),
    dist_beta(3,5),
    dist_binomial(4, 0.3),
    dist_bernoulli(0.4), 
    dist_multivariate_normal(mu = list(c(1,2)), sigma = list(matrix(c(4,2,2,3), ncol=2))), 
    dist_categorical(list(c(0.4, 0.6)), outcomes = list(c("A", "B"))))
) %>% 
  mutate(support(dist))
#> # A tibble: 8 × 2
#>             dist `support(dist)`
#>           <dist>      <spprt_rg>
#> 1        N(0, 1)               R
#> 2 N(0, 1)[5,Inf]         [5,Inf]
#> 3        Pois(1)              N0
#> 4     Beta(3, 5)           [0,1]
#> 5      B(4, 0.3)     [0,1,...,4]
#> 6 Bernoulli(0.4)             lgl
#> 7         MVN[2]             R^2
#> 8 Categorical[2]             chr

Created on 2022-01-04 by the reprex package (v2.0.0)