mitchelloharawild / distributional

Vectorised distributions for R
https://pkg.mitchelloharawild.com/distributional
GNU General Public License v3.0
97 stars 15 forks source link

Automatically supply inverses of known functions #78

Closed mjskay closed 2 years ago

mjskay commented 2 years ago

Hi --- as I've been mocking up more examples and use cases for {distributional} with {ggdist} one of the things that I kept running into was the error Inverting transformations for distributions is not yet supported, usually when transforming a distribution and then trying to visualize it. This is because visualizing distributions with ggdist typically requires all of cdf(), quantile(), and density() to be defined, and these do not all work without knowing the inverse transformation.

While I figured out I could use dist_transformed() directly to work around this (e.g. by replacing things like log2(dist_XXX()) with dist_transformed(dist_XXX(), log2, function(x) 2^x), this seemed awkward, since we have several known elementary functions with inverses handled by the Math and Ops generics. So, this pull request modifies those generics to automatically supply inverses for several functions.

This allows you to visualize pretty arbitrary chains of transformations without manually specifying the inverse. A very silly example (this uses the unify-dist branch of ggdist; i.e. install_github("mjskay/ggdist@unify-dist")):

data.frame(
  y = 1:5,
  x = (log(dist_gamma(1,1)) / 2 + 4) * 1:5
) %>%
  ggplot(aes(y = y, xdist = x)) +
  stat_halfeye()

image

Let me know if this seems reasonable. Happy to take feedback and adjust as needed.

mitchelloharawild commented 2 years ago

Thanks. I also have similar functionality to this in {fabletools} that I was considering porting over. It is a bit more extensible as it allows packages to register functions and their inverse to a lookup table, so that functions like fabletools::box_cox() can associate its own inverse fabletools::inv_box_cox().

The code in {fabletools} is a bit clunky because I wanted to handle it by rewriting the expressions, but I think the idea is fine with some implementation improvements: https://github.com/tidyverts/fabletools/blob/master/R/transform.R

The reason for not adding the inverses yet is because I was debating if this functionality warrants its own package. Being able to invert a function has applications beyond distributions / forecasting (for example, the scales package provides some inverse functions for common plot transformations), and reasonable defaults/fallbacks can probably be obtained with root finding algorithms.

mjskay commented 2 years ago

Ah yeah, I love the idea of allowing others to register inverses. Probably makes sense to do this in another package as you say.

Having the expressions rewrite themselves into something human-readable would be great too, rather than my solution of just wrapping functions. Especially if the API (from the registration side) could have reduced boilerplate; e.g. something like register_inverse(expr(log(x, base = exp(1))), expr(base ^ x)), where it could assume x is always the target and rewrite from there (more complex functions for more complex transformations would then need to exist as external functions, but this would keep the transformations human-readable). Then you could even allow people to see long-form versions of transformed distributions as strings giving the full transformation in output, rather than "t(...)". While it might be a bit much for compact displays, it could be useful in some cases as an option.

Speaking of scales, I filed a related issue there asking for the inclusion of derivatives of the transformations to make it possible to apply the Jacobian adjustment to densities without using numerical differentiation: https://github.com/r-lib/scales/issues/322. Would be nice to supply those in this transformation system as well.

Re: this PR, what are you thinking? If you don't take the full PR, there are a few minor fixes in here that are probably worth poaching (e.g., a missing abs() in the Jacobian and the handling of the base parameter when doing log(dist_lognormal(...), base = ...)). At the same time, if you're not planning on rolling out the more comprehensive transformation system soon, it would be nice (from my side) to have something like this as a stopgap. I assume the future functionality would be a superset of what's here so it should be forwards-compatible.

mitchelloharawild commented 2 years ago

The boilerplate should be possible, and something that I'd like to have as an option for users also.

I also don't mind changing the t(<dist>) formatting, but am wary that the distribution strings can become long quickly. Perhaps this can be a non-default option to show transformation expressions isntead.

As for the PR, the fixes definitely should be grabbed. I also don't mind merging the inversion functions but if we can determine the scope of the new package it shouldn't take long to put it together and get it on CRAN. The current scope I have in mind is:

  1. Takes an expression/function (say log(x+1)) and produce the inverted expression/function (exp(x) - 1)
  2. If the expression/function could not be inverted, fallback to providing a root finding function.
  3. A lookup table of function inverses that users/packages can add to.

Is there anything you would add to the scope?

The {fabletools} package provides a transformation class which stores the transformation and its inverse, however I'm not sure how useful/necessary this is for the package. It is used as an interface (via new_transformation()) for users to provide custom/complex transformations with their associated inverse.

I think deferring derivatives to stats::deriv() should be okay in most applications. Perhaps a 'transformation' object (containing the transform and inverse functions) should optionally allow derivative functions to be provided in case the symbolic derivative cannot be found.

mjskay commented 2 years ago

I also don't mind changing the t() formatting, but am wary that the distribution strings can become long quickly. Perhaps this can be a non-default option to show transformation expressions isntead.

Agreed.

The {fabletools} package provides a transformation class which stores the transformation and its inverse, however I'm not sure how useful/necessary this is for the package. It is used as an interface (via new_transformation()) for users to provide custom/complex transformations with their associated inverse.

I like the idea of the main user-facing interface being a simple function where you provide an expression and it returns the inverted expression (or maybe a function). I think having a representation of the pair of (function, inverse) will also be useful, particularly for some package code --- e.g. being able to convert to/from the (function, inverse) representation from scales to the representation in this package may make life easier in ggdist.

I think deferring derivatives to stats::deriv() should be okay in most applications. Perhaps a 'transformation' object (containing the transform and inverse functions) should optionally allow derivative functions to be provided in case the symbolic derivative cannot be found.

I think being able to optionally provide the derivatives will be helpful. There are some common cases where stats::deriv doesn't have functions in its table of derivatives (e.g. plogis/qlogis). In fact, I just discovered a package (Deriv) that specifically addresses some of the shortcoming of stats:deriv (and has very few dependencies and allows updating the derivatives table); perhaps we could use that to provide derivatives easily?

mitchelloharawild commented 2 years ago

I think being able to optionally provide the derivatives will be helpful. There are some common cases where stats::deriv doesn't have functions in its table of derivatives (e.g. plogis/qlogis). In fact, I just discovered a package (Deriv) that specifically addresses some of the shortcoming of stats:deriv (and has very few dependencies and allows updating the derivatives table); perhaps we could use that to provide derivatives easily?

Perhaps we can extend stats::deriv() with a stats::deriv(<transformation>) method that adds a few missing derivatives? I'll look into the {Deriv} package.

mjskay commented 2 years ago

cool thanks!