greta-dev / greta

simple and scalable statistical modelling in R
https://greta-stats.org
Other
527 stars 63 forks source link

Helper functions to establish new TFP / greta distributions #369

Open njtierney opened 3 years ago

njtierney commented 3 years ago

A helper function to establish a new distribution. It would perhaps write onto the screen (as in datapasta::dpasta) write to to clipboard, or create a new "name.R" R file.

The idea is that you write in the details of the function into a function like, use_new_distribution, providing the name of the distribution, parameters, args, bounds, and whether it is in TFP.

There might also be some helper functions to help users find their function + args in TFP as well.

Perhaps methods could be written to establish new distributions based on existing functions from packages like distributional, but perhaps that might be a bit too much magic.

The templating could be done with whisker

below is some pseudo code of how this might work.

# function to create template of R6 distribution

use_new_distribution <- function(name, # character
                             parameters, # list
                             dim, #expected dimension?
                             args, # extra arguments
                             bounds, # two parameters, e.g., c(0, Inf) 
                             in_tfp){ # logical, is it in tensorflow?

"{name} <- R6Class("
    "{name}_distribution"
    inherit = distribution_node,
    public = list(

      initialize = function(parameter, dim, args) {

        '{parameter <- as.greta_array(parameter)}'

        # add the nodes as parents and parameters
        dim <- check_dims('{parameter}', target_dim = dim)

        # put other checks here
        # see for example, 'check_positive', etc.
          # e.g., check_positive(truncation)

        # 
        self$bounds <- c(0, Inf)
        super$initialize("{name}", dim, args)
        # this happens for each one
        'self$add_parameter(parameters, "parameters")'
      },

      # this part will be tricky if there isn't a tensorflow probability
      # distribution already defined. Perhaps provide an example for both the
      # TF distribution if it exists, and also if the tfp distribution doesn't
      # exist. Perhaps also include a list data object in greta of the available
      # TFP distributions/arguments?

      if (in_tfp) {

        tf_distrib = function('{parameters}', dag) {
          tfp$distributions$'{name}'(concentration = parameters$shape,
                                     rate = parameters$rate)
        }

      }

    )
  )

}

This is an example distribution, this is the gamma_distribution function

gamma_distribution <- R6Class(
  "gamma_distribution",
  inherit = distribution_node,
  public = list(

    initialize = function(shape, rate, dim, truncation) {

      shape <- as.greta_array(shape)
      rate <- as.greta_array(rate)

      # add the nodes as parents and parameters
      dim <- check_dims(shape, rate, target_dim = dim)
      check_positive(truncation)
      self$bounds <- c(0, Inf)
      super$initialize("gamma", dim, truncation)
      self$add_parameter(shape, "shape")
      self$add_parameter(rate, "rate")
    },

    tf_distrib = function(parameters, dag) {
      tfp$distributions$Gamma(concentration = parameters$shape,
                              rate = parameters$rate)
    }

  )
)
njtierney commented 3 years ago

Here is an example of creating a new TF distribution in the F_distribution, this could be broken down in a vignette, discussing the key components that need to be provided for the user, as well as describing how in greta, there is a process for comparing distributions against a reference, which users might need to do if they are contributing to greta, or if they want their own work to be nice and stable.