mitchelloharawild / distributional

Vectorised distributions for R
https://pkg.mitchelloharawild.com/distributional
GNU General Public License v3.0
100 stars 16 forks source link

Type of return value of density, cdf and quantile is inconsistent #99

Open venpopov opened 8 months ago

venpopov commented 8 months ago

While doing some testing I noticed that density, cdf and quantile sometimes return vectors, sometimes lists:

library(distributional)

density(dist_normal(), c(0,1))                     # returns list
#> [[1]]
#> [1] 0.3989423 0.2419707
density(c(dist_normal(),dist_normal()), c(0,1))    # returns list
#> [[1]]
#> [1] 0.3989423 0.2419707
#> 
#> [[2]]
#> [1] 0.3989423 0.2419707
density(dist_normal(), 0)                          # returns vector
#> [1] 0.3989423
density(c(dist_normal(),dist_normal()), 0)         # returns vector
#> [1] 0.3989423 0.3989423

cdf(dist_normal(), c(0,1))                         # returns list
#> [[1]]
#> [1] 0.5000000 0.8413447
cdf(c(dist_normal(),dist_normal()), c(0,1))        # returns list
#> [[1]]
#> [1] 0.5000000 0.8413447
#> 
#> [[2]]
#> [1] 0.5000000 0.8413447
cdf(dist_normal(), 0)                              # returns vector
#> [1] 0.5
cdf(c(dist_normal(),dist_normal()), 0)             # returns vector
#> [1] 0.5 0.5

quantile(dist_normal(), c(0,1))                    # returns list
#> [[1]]
#> [1] -Inf  Inf
quantile(c(dist_normal(),dist_normal()), c(0,1))   # returns list
#> [[1]]
#> [1] -Inf  Inf
#> 
#> [[2]]
#> [1] -Inf  Inf
quantile(dist_normal(), 0)                         # returns vector
#> [1] -Inf
quantile(c(dist_normal(),dist_normal()), 0)        # returns vector
#> [1] -Inf -Inf

# all return lists
generate(dist_normal(), 2)                         
#> [[1]]
#> [1] 1.7134674 0.4435964
generate(c(dist_normal(),dist_normal()), 2)
#> [[1]]
#> [1] -0.8288996  1.3283409
#> 
#> [[2]]
#> [1]  0.3532802 -0.7265644
generate(c(dist_normal(),dist_normal()), c(5,10))
#> [[1]]
#> [1] -0.24069096 -0.69155357  0.75666937  1.41401325 -0.07069785
#> 
#> [[2]]
#>  [1] -0.5421478 -1.2236593 -0.4537853 -0.3443427  0.2597258 -1.8010175
#>  [7]  0.8553006 -0.9100003 -0.3662697 -0.1896853

Created on 2024-04-01 with reprex v2.1.0

And the expected output type is not documented for these functions. I think it would be good to have consistent output type. For example, I was surprised that density(dist_norma(), 0) returns a scalar, but that density(dist_normal(), c(0,1)) returns a list, while with two distributions and single point we get a vector.

One possibility is to always return a list, where each element corresponds to one distribution, like it is now for some of the outputs. Although I believe this would be desirable, it is a breaking change that might affect other packages. In any case, the return value type should be documented.

Additionally, it could also be nice to have an option simplify2array as an argument.

mitchelloharawild commented 7 months ago

The documentation should definitely be improved to explain how the distributions and arguments are vectorised. Essentially the operation's argument (e.g. density(at=)) is vectorised across the distributions and the result is simplified if possible. While I agree that returning lists with one element is more consistent, it would be less efficient and less usable - it could be a non-default option, like drop = FALSE.

The issue with simplify2array() is that the vectorisation needs to work with multivariate distributions which already produces arrays.