mitchelloharawild / distributional

Vectorised distributions for R
https://pkg.mitchelloharawild.com/distributional
GNU General Public License v3.0
94 stars 15 forks source link

Investigate vec_ptype2 and vec_cast methods for distribution objects #26

Closed mitchelloharawild closed 4 years ago

mitchelloharawild commented 4 years ago

Apparently vctrs::vec_rbind() is slow for many fables, due to the combining of distribution columns.

mitchelloharawild commented 4 years ago

Some benchmarks:

library(distributional)
library(vctrs)
library(tsibble)
bench::mark(
  vec_c(!!!rep(list(dist_normal()), 10000)),
  vec_c(!!!rep(list(new_hilo(1, 2, 3)), 10000)),
  vec_c(!!!rep(list(yearmonth(0)), 10000)),
  vec_c(!!!rep(list(1), 10000)),
  check = FALSE
)$median
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> [1] 739.52ms   1.41s  616.29ms   5.95ms

Created on 2020-06-10 by the reprex package (v0.3.0)

tomjemmett commented 4 years ago

I was investigating an issue a colleague had when trying to use the RStudio viewer with a fbl_ts object - looks like View() converts everything to a character, so I needed to create vec_ptype2/vec_cast functions for the distribution class: I'm happy to write a PR if that is helpful to you! My current implementation is below, it was the bare minimum I needed to get View() to work so may not be suitable (I just handle conversion from distribution -> character, and then rely on your existing format s3 method)

vec_ptype2.distribution <- function(x, y, ...) UseMethod("vec_ptype2.distribution")

vec_ptype2.distribution.character <- function(x, y, ...) character()
vec_ptype2.character.distribution <- function(x, y, ...) character()

vec_cast.character.distribution <- function(x, to, ...) {
  format(x)
}
mitchelloharawild commented 4 years ago

Thanks @tomjemmett, could you check if you're able to user the View() pane using the development version of the package? I added something for this in https://github.com/mitchelloharawild/distributional/commit/8320c449fb5202b3c72c3ef535c435e828233d57.

I only implemented the vec_cast method as it is explicitly coerced in View() using as.character(), and I don't think it is correct for vec_c() to convert a distribution to a character as it would be a lossy cast.

I also think it is reasonable to produce a degenerate distribution from a character, which is what is currently done for combining numerics with distributions. This would give a discrete degenerate distribution. It's not currently implemented, as I'm still thinking about the idea.

Please let me know what you think of handling characters, and if the dev version works for you.

mitchelloharawild commented 4 years ago

Closing as this will be (indirectly) resolved by #25 as it will be unlikely to combine many single distribution vectors together. Further, the changes required to improve this performance most likely need to occur in {vctrs}.