mitchelloharawild / distributional

Vectorised distributions for R
https://pkg.mitchelloharawild.com/distributional
GNU General Public License v3.0
97 stars 15 forks source link

methods to get out distribution parameter values? #64

Closed njtierney closed 3 years ago

njtierney commented 3 years ago

Is there a way to extract out the distribution parameter values from a distributional object?

It would be handy to be able to extract this information out to explore distribution objects.

If this isn't already implemented, perhaps the function could be called something like parameters ?

library(distributional)
library(purrr)
my_dist <- dist_normal(mu = 10, sigma = 1)
my_dist
#> <distribution[1]>
#> [1] N(10, 1)
names(my_dist)
#> NULL
dimnames(my_dist)
#> NULL
str(my_dist)
#> dist [1:1] 
#> $ :List of 2
#>  ..$ mu   : num 10
#>  ..$ sigma: num 1
#>  ..- attr(*, "class")= chr [1:2] "dist_normal" "dist_default"
pluck(my_dist, 1) %>% c()
#> $mu
#> [1] 10
#> 
#> $sigma
#> [1] 1

parameters <- function(distribution){
  c(distribution[[1]])
}

parameters(my_dist)
#> $mu
#> [1] 10
#> 
#> $sigma
#> [1] 1

Created on 2021-09-14 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.0 (2021-05-18) #> os macOS Big Sur 10.16 #> system x86_64, darwin17.0 #> ui X11 #> language (EN) #> collate en_AU.UTF-8 #> ctype en_AU.UTF-8 #> tz Australia/Perth #> date 2021-09-14 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0) #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.1.0) #> cli 3.0.1 2021-07-17 [1] CRAN (R 4.1.0) #> colorspace 2.0-2 2021-06-24 [1] CRAN (R 4.1.0) #> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.0) #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.1.0) #> distributional * 0.2.2 2021-02-02 [1] CRAN (R 4.1.0) #> dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.1.0) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0) #> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0) #> farver 2.1.0 2021-02-28 [1] CRAN (R 4.1.0) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0) #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.1.0) #> ggplot2 3.3.5 2021-06-25 [1] CRAN (R 4.1.0) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.1.0) #> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0) #> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.1.0) #> knitr 1.33 2021-04-24 [1] CRAN (R 4.1.0) #> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.1.0) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.1.0) #> pillar 1.6.2 2021-07-29 [1] CRAN (R 4.1.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0) #> purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.1.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.0) #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.0) #> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.0) #> rmarkdown 2.9 2021-06-15 [1] CRAN (R 4.1.0) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0) #> scales 1.1.1 2020-05-11 [1] CRAN (R 4.1.0) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0) #> stringi 1.7.3 2021-07-16 [1] CRAN (R 4.1.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0) #> styler 1.4.1 2021-03-30 [1] CRAN (R 4.1.0) #> tibble 3.1.3 2021-07-23 [1] CRAN (R 4.1.0) #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0) #> withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.0) #> xfun 0.24 2021-06-15 [1] CRAN (R 4.1.0) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.1/Resources/library ```
mitchelloharawild commented 3 years ago

Duplicate of #36, but I'll close #36 as this provides more detail.

njtierney commented 3 years ago

OK cool, thanks, @mitchelloharawild !

Let me know if I can help.

mitchelloharawild commented 3 years ago

I'm trying to work out the nicest output format for this, better ideas would be appreciated. This is my current attempt (in parameters branch, https://github.com/mitchelloharawild/distributional/commit/c579a2ec52d0561834548a94c59eef9d58b6c8b2), but I don't know of nice functions that work with unpacking named lists.

library(tidyverse)
library(distributional)

For a simple (single class) distribution vector.

dist <- dist_normal(1:3)
parameters(dist)
#> [[1]]
#> [[1]]$mu
#> [1] 1
#> 
#> [[1]]$sigma
#> [1] 1
#> 
#> 
#> [[2]]
#> [[2]]$mu
#> [1] 2
#> 
#> [[2]]$sigma
#> [1] 1
#> 
#> 
#> [[3]]
#> [[3]]$mu
#> [1] 3
#> 
#> [[3]]$sigma
#> [1] 1

tibble(dist = dist, par = parameters(dist))
#> # A tibble: 3 × 2
#>      dist par             
#>    <dist> <list>          
#> 1 N(1, 1) <named list [2]>
#> 2 N(2, 1) <named list [2]>
#> 3 N(3, 1) <named list [2]>

For mixed distribution classes.

dist <- c(dist_normal(1:2), dist_poisson(3))
parameters(dist)
#> [[1]]
#> [[1]]$mu
#> [1] 1
#> 
#> [[1]]$sigma
#> [1] 1
#> 
#> 
#> [[2]]
#> [[2]]$mu
#> [1] 2
#> 
#> [[2]]$sigma
#> [1] 1
#> 
#> 
#> [[3]]
#> [[3]]$l
#> [1] 3
tibble(dist = dist, par = parameters(dist))
#> # A tibble: 3 × 2
#>      dist par             
#>    <dist> <list>          
#> 1 N(1, 1) <named list [2]>
#> 2 N(2, 1) <named list [2]>
#> 3 Pois(3) <named list [1]>

Created on 2021-09-14 by the reprex package (v2.0.0)

mitchelloharawild commented 3 years ago

If you know all parameters of the same structure, you can cast to a tibble column using bind_rows():

library(tidyverse)
library(distributional)

For a simple (single class) distribution vector.

dist <- dist_normal(1:3)
parameters(dist)
#> [[1]]
#> [[1]]$mu
#> [1] 1
#> 
#> [[1]]$sigma
#> [1] 1
#> 
#> 
#> [[2]]
#> [[2]]$mu
#> [1] 2
#> 
#> [[2]]$sigma
#> [1] 1
#> 
#> 
#> [[3]]
#> [[3]]$mu
#> [1] 3
#> 
#> [[3]]$sigma
#> [1] 1

tibble(dist = dist, par = bind_rows(parameters(dist)))
#> # A tibble: 3 × 2
#>      dist par$mu $sigma
#>    <dist>  <dbl>  <dbl>
#> 1 N(1, 1)      1      1
#> 2 N(2, 1)      2      1
#> 3 N(3, 1)      3      1

Created on 2021-09-14 by the reprex package (v2.0.0)

mitchelloharawild commented 3 years ago

This also works for mixed classes, but gives NAs for mismatched names.

library(tidyverse)
library(distributional)

dist <- c(dist_normal(1:2), dist_poisson(3))
parameters(dist)
#> [[1]]
#> [[1]]$mu
#> [1] 1
#> 
#> [[1]]$sigma
#> [1] 1
#> 
#> 
#> [[2]]
#> [[2]]$mu
#> [1] 2
#> 
#> [[2]]$sigma
#> [1] 1
#> 
#> 
#> [[3]]
#> [[3]]$l
#> [1] 3
tibble(dist = dist, par = bind_rows(parameters(dist)))
#> # A tibble: 3 × 2
#>      dist par$mu $sigma    $l
#>    <dist>  <dbl>  <dbl> <dbl>
#> 1 N(1, 1)      1      1    NA
#> 2 N(2, 1)      2      1    NA
#> 3 Pois(3)     NA     NA     3

Created on 2021-09-14 by the reprex package (v2.0.0)

njtierney commented 3 years ago

I think this works well, having NAs is fine as it makes sense those parameters aren't there.

Perhaps there could be two versions of parameters:

mitchelloharawild commented 3 years ago

It might even be more consistent with the design of the package (https://github.com/mitchelloharawild/distributional/issues/52#issuecomment-886370060) to only provide parameters_dfr() as just parameters(). I'd prefer to avoid the *_dfr() suffix if possible.

njtierney commented 3 years ago

Yeah I mean I think that's good, I do like a world of dataframes as output :)

mitchelloharawild commented 3 years ago

I've changed parameters() to return the data frame format, and also extended it to support more complex arguments (multivariate, matrix parameters, etc). Here is what we currently get:

library(distributional)
dist <- c(
  dist_normal(1:2), 
  dist_poisson(3), 
  dist_multinomial(size = c(4, 3), prob = list(c(0.3, 0.5, 0.2), c(0.1, 0.5, 0.4)))
)
parameters(dist)
#>   mu sigma  l  s             p
#> 1  1     1 NA NA          NULL
#> 2  2     1 NA NA          NULL
#> 3 NA    NA  3 NA          NULL
#> 4 NA    NA NA  4 0.3, 0.5, 0.2
#> 5 NA    NA NA  3 0.1, 0.5, 0.4
tibble::as_tibble(parameters(dist))
#> # A tibble: 5 × 5
#>      mu sigma     l     s p        
#>   <dbl> <dbl> <dbl> <dbl> <list>   
#> 1     1     1    NA    NA <NULL>   
#> 2     2     1    NA    NA <NULL>   
#> 3    NA    NA     3    NA <NULL>   
#> 4    NA    NA    NA     4 <dbl [3]>
#> 5    NA    NA    NA     3 <dbl [3]>

Created on 2021-10-04 by the reprex package (v2.0.0)

Note the difference between missing lists and missing vectors, it looks strange but I think this is the best solution here.