paul-buerkner / brms

brms R package for Bayesian generalized multivariate non-linear multilevel models using Stan
https://paul-buerkner.github.io/brms/
GNU General Public License v2.0
1.26k stars 177 forks source link

Gaussian Copulas #1317

Open andrjohns opened 2 years ago

andrjohns commented 2 years ago

Hey Paul,

I'm currently working on adding gaussian copulas to brms using @spinkney's implementations over in https://github.com/spinkney/helpful_stan_functions/blob/main/functions/copula/centered_gaussian_copula.stanfunctions

The Stan code in that repository is licensed under BSD-3, which I believe is compatible with the brms GPLv2 license (according to the FSF), as long as the copyright notice is included.

For the brms integration, I'm taking the approach of including the copyright notice when the copula code is used. For example:

functions {
  /*
    NOTE:
    The Gaussian copula Stan code is distributed against the BSD 3-Clause license

    BSD 3-Clause License

    Copyright (c) 2021, Sean Pinkney
    All rights reserved.

    Redistribution and use in source and binary forms, with or without
    modification, are permitted provided that the following conditions are met:

    1. Redistributions of source code must retain the above copyright notice, this
      list of conditions and the following disclaimer.

    2. Redistributions in binary form must reproduce the above copyright notice,
      this list of conditions and the following disclaimer in the documentation
      and/or other materials provided with the distribution.

    3. Neither the name of the copyright holder nor the names of its
      contributors may be used to endorse or promote products derived from
      this software without specific prior written permission.

    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
    AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
    IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
    DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
    FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
    DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
    SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
    CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
    OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */
/**
  * Multivariate Normal Copula log density (Cholesky parameterisation)
  *
  * See https://github.com/spinkney/helpful_stan_functions for
  * further documentation
  *
  * @copyright Ethan Alt, Sean Pinkney 2021
  *
  * @param U Matrix of outcomes from marginal calculations
  * @param L Cholesky factor of the correlation/covariance matrix
  * @return log density of distribution
  */
  real multi_normal_cholesky_copula_lpdf(matrix U, matrix L) {
    int N = rows(U);
    int J = cols(U);
    matrix[J, J] Gammainv = chol2inv(L);
    return -N * sum(log(diagonal(L))) - 0.5 * sum(add_diag(Gammainv, -1.0) .* crossprod(U));
  }

Would you have any issues with this approach? You can see it in action on my branch over here: https://github.com/andrjohns/brms/tree/feature/gaussian-copula/inst/chunks/copula

paul-buerkner commented 2 years ago

Thank you for working on such a feature! The license approach looks good to me.

Can you tell me a bit about your envisioned integration of this feature in brms? E.g., how well can it be combined with other features and how would the user interface look like?

andrjohns commented 2 years ago

Great! My current plan is to add a copula argument to set_rescor, like: set_rescor(rescor = TRUE, copula = "gaussian"). This keeps the interface consistent across all correlated-outcome models and also keeps the option available for other copula types to be added in the future.

As for the actual implementation, the Stan code itself is pretty flexible and essentially just takes the same arguments that would be passed to a given non-glm _lpdf function.

It can be a bit tricky because the parameters/ arguments for each type (e.g., gaussian, poisson) have to be packed into matrices before being passed to their respective marginal functions. I'm not sure yet how this will interact with the more complex aspects of the brms language, but it's still early days.

The current rescor infrastructure appears to handle the multiple outcome types with minimal modifications. Here's a quick example of how the syntax and output currently looks:

setwd("~/brms")
devtools::load_all()
#> i Loading brms
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.16.11). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').

set.seed(1234)

n_out = mnormt::rmnorm(n = 500, mean = c(0, 0),
                       varcov = cbind(c(1,-0.6),c(-0.6,1)))
bdata = data.frame(
  y1 = n_out[,1],
  y2 = n_out[,2],
  c = rpois(500, 10)
)

mn <- bf(y1 ~ 1, family = gaussian())
mn2 <- bf(y2 ~ 1, family = gaussian())
mp <- bf(c ~ 1, family = poisson())

mod_n <- mn + mp + mn2 + set_rescor(rescor = TRUE, copula = "gaussian")

t_n <- brm(mod_n, data = bdata, cores = 4)
#> Compiling Stan program...
#> Start sampling
summary(t_n)
#>  Family: MV(gaussian, poisson, gaussian) 
#>   Links: mu = identity; sigma = identity
#>          mu = log
#>          mu = identity; sigma = identity 
#> Formula: y1 ~ 1 
#>          c ~ 1 
#>          y2 ~ 1 
#>    Data: bdata (Number of observations: 500) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Population-Level Effects: 
#>              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> y1_Intercept    -0.03      0.05    -0.12     0.06 1.00     4238     3116
#> c_Intercept      2.30      0.01     2.27     2.33 1.00     7766     3061
#> y2_Intercept     0.00      0.04    -0.08     0.09 1.00     4501     3254
#> 
#> Family Specific Parameters: 
#>          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma_y1     1.03      0.03     0.97     1.09 1.00     4913     3228
#> sigma_y2     0.98      0.03     0.93     1.05 1.00     4811     3321
#> 
#> Residual Correlations: 
#>               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> rescor(y1,c)     -0.05      0.05    -0.14     0.04 1.00     7102     2851
#> rescor(y1,y2)    -0.61      0.03    -0.66    -0.55 1.00     4559     3094
#> rescor(c,y2)      0.07      0.04    -0.02     0.15 1.00     6448     2972
#> 
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).

Created on 2022-03-14 by the reprex package (v2.0.1)

Where the generated Stan code is:

//> make_stancode(mod_n, data = bdata, backend = "cmdstanr")
// generated with brms 2.16.11
functions {
  /*
    NOTE:
    The Gaussian copula Stan code is distributed against the BSD 3-Clause license

    BSD 3-Clause License

    Copyright (c) 2021, Sean Pinkney
    All rights reserved.

    Redistribution and use in source and binary forms, with or without
    modification, are permitted provided that the following conditions are met:

    1. Redistributions of source code must retain the above copyright notice, this
      list of conditions and the following disclaimer.

    2. Redistributions in binary form must reproduce the above copyright notice,
      this list of conditions and the following disclaimer in the documentation
      and/or other materials provided with the distribution.

    3. Neither the name of the copyright holder nor the names of its
      contributors may be used to endorse or promote products derived from
      this software without specific prior written permission.

    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
    AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
    IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
    DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
    FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
    DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
    SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
    CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
    OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */
  /* Returns the inverse of the matrix whose Cholesky factor is L
  *
  * Reimplemented for compatibility with Stan < 2.26
  *
  * Args:
  *   L: Matrix that is a Cholesky factor
  * Returns:
  *   The matrix inverse of L * L'
  */
  matrix chol2inv2(matrix L) {
    int K = rows(L);
    matrix[K, K] K_identity = diag_matrix(rep_vector(1.0, K));
    matrix[K, K] L_inv = mdivide_left_tri_low(L, K_identity);
    matrix[K, K] rtn;

    if (K == 0) {
      return L;
    }

    if (K == 1) {
      rtn[1, 1] = inv_square(L[1, 1]);
    } else {
      for (k in 1:K) {
        rtn[k, k] = dot_self(tail(L_inv[ : , k], K + 1 - k));
        for (j in (k + 1):K) {
          int Kmj = K - j;
          rtn[k, j] = dot_product(tail(L_inv[ : , k], Kmj + 1),
                                  tail(L_inv[ : , j], Kmj + 1));
          rtn[j, k] = rtn[k, j];
        }
      }
    }

    return rtn;
  }
  /**
  * Multivariate Normal Copula log density (Cholesky parameterisation)
  *
  * See https://github.com/spinkney/helpful_stan_functions for
  * further documentation
  *
  * @copyright Ethan Alt, Sean Pinkney 2021
  *
  * @param U Matrix of outcomes from marginal calculations
  * @param L Cholesky factor of the correlation/covariance matrix
  * @return log density of distribution
  */
  real multi_normal_cholesky_copula_lpdf(matrix U, matrix L) {
    int N = rows(U);
    int J = cols(U);
    matrix[J, J] Gammainv = chol2inv2(L);
    return -N * sum(log(diagonal(L))) - 0.5 * sum(add_diag(Gammainv, -1.0) .* crossprod(U));
  }
  /**
  * Mixed Copula Log-Probability Function
  *
  * @copyright Andrew Johnson 2022 n
  *
  * @param marginals Nested arrays of matrices from marginal calculations
  * @param L Cholesky Factor Correlation
  * @return Real log-probability
  */
  real centered_gaussian_copula_cholesky_(matrix[,] marginals, matrix L, int[] order) {
    // Extract dimensions of final outcome matrix
    int N = rows(marginals[1][1]);
    int J = rows(L);
    matrix[N, J] U;

    // Iterate through marginal arrays, concatenating the outcome matrices by column
    // and aggregating the log-likelihoods (from continuous marginals) and jacobian
    // adjustments (from discrete marginals)
    real adj = 0;
    int pos = 1;
    for (m in 1 : size(marginals)) {
      int curr_cols = cols(marginals[m][1]);

      U[ : , pos : (pos + curr_cols - 1)] = marginals[m][1];

      adj += sum(marginals[m][2]);
      pos += curr_cols;
    }

    // Return the sum of the log-probability for copula outcomes and jacobian adjustments
    return multi_normal_cholesky_copula_lpdf(U[ : , order] | L) + adj;
  }
  /**
  * Normal marginal
  *
  * Standardized normal marginal for mixed continuous-discrete Gaussian
  * copula.
  *
  * See https://github.com/spinkney/helpful_stan_functions for
  * further documentation
  *
  * @copyright Ethan Alt, Sean Pinkney 2021 n
  *
  * @param y matrix of normal outcomes
  * @param mu_glm row_vector of regression means
  * @param matrix vector of outcome SD's
  * @return 2D array of matrices containing the random variables
  *         and jacobian adjustments
  */
  matrix[] normal_marginal(matrix y, matrix mu_glm, vector sigma) {
    int N = rows(mu_glm);
    int J = cols(mu_glm);
    matrix[N, J] rtn[2];
    // Initialise the jacobian adjustments to zero, as vectorised lpdf will be used
    rtn[2] = rep_matrix(0, N, J);

    for (j in 1 : J) {
      rtn[1][ : , j] = (y[ : , j] - mu_glm[ : , j]) / sigma[j];
      rtn[2][1, j] = normal_lpdf(y[ : , j] | mu_glm[ : , j], sigma[j]);
    }

    return rtn;
  }
  /**
  * Poisson marginal
  *
  * Poisson marginal for mixed continuous-discrete Gaussian
  * copula.
  *
  * See https://github.com/spinkney/helpful_stan_functions for
  * further documentation
  *
  * @copyright Ethan Alt, Sean Pinkney 2021 n
  *
  * @param y int[,] 2D array of integer counts
  * @param mu_glm matrix of regression means
  * @param u_raw matrix of nuisance latent variables
  * @return 2D array of matrices containing the random variables
  *         and jacobian adjustments
  */
  matrix[] poisson_marginal(int[,] y, matrix mu_glm, matrix u_raw) {
    int N = rows(mu_glm);
    int J = cols(mu_glm);
    matrix[N, J] mu_glm_exp = exp(mu_glm);
    matrix[N, J] rtn[2];

    for (j in 1 : J) {
      for (n in 1 : N) {
        real Ubound = poisson_cdf(y[n, j], mu_glm_exp[n, j]);
        real Lbound = 0;
        real UmL;
        if (y[n, j] > 0) {
          Lbound = poisson_cdf(y[n, j] - 1, mu_glm_exp[n, j]);
        }
        UmL = Ubound - Lbound;
        rtn[1][n, j] = inv_Phi(Lbound + UmL * u_raw[n, j]);
        rtn[2][n, j] = log(UmL);
      }
    }

    return rtn;
  }
}
data {
  int<lower=1> N;  // total number of observations
  int<lower=1> N_y1;  // number of observations
  vector[N_y1] Y_y1;  // response variable
  int<lower=1> N_c;  // number of observations
  int Y_c[N_c];  // response variable
  int<lower=1> N_y2;  // number of observations
  vector[N_y2] Y_y2;  // response variable
  int<lower=1> nresp;  // number of responses
  int nrescor;  // number of residual correlations
  int prior_only;  // should the likelihood be ignored?
}
transformed data {
  int Outcome_Order[3];  // response array
  matrix[N, 2] Y_gaussian;  // response array
  int Y_poisson[N, 1];  // response array
  Outcome_Order[1] = 1;
  Y_gaussian[ : , 1] = Y_y1;
  Outcome_Order[2] = 3;
  Y_gaussian[ : , 2] = Y_y2;
  Outcome_Order[3] = 2;
  Y_poisson[ : , 1] = Y_c;
}
parameters {
  real Intercept_y1;  // temporary intercept for centered predictors
  real<lower=0> sigma_y1;  // dispersion parameter
  real Intercept_c;  // temporary intercept for centered predictors
  real Intercept_y2;  // temporary intercept for centered predictors
  real<lower=0> sigma_y2;  // dispersion parameter
  matrix<lower=0, upper=1>[N, 1] URaw_poisson;
  cholesky_factor_corr[nresp] Lrescor;  // parameters for multivariate linear models
}
transformed parameters {
  real lprior = 0;  // prior contributions to the log posterior
  lprior += student_t_lpdf(Intercept_y1 | 3, -0.1, 2.5);
  lprior += student_t_lpdf(sigma_y1 | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  lprior += student_t_lpdf(Intercept_c | 3, 2.3, 2.5);
  lprior += student_t_lpdf(Intercept_y2 | 3, 0, 2.5);
  lprior += student_t_lpdf(sigma_y2 | 3, 0, 2.5)
    - 1 * student_t_lccdf(0 | 3, 0, 2.5);
  lprior += lkj_corr_cholesky_lpdf(Lrescor | 1);
}
model {
  // likelihood including constants
  if (!prior_only) {
    // initialize linear predictor term
    vector[N_y1] mu_y1 = Intercept_y1 + rep_vector(0.0, N_y1);
    // initialize linear predictor term
    vector[N_c] mu_c = Intercept_c + rep_vector(0.0, N_c);
    // initialize linear predictor term
    vector[N_y2] mu_y2 = Intercept_y2 + rep_vector(0.0, N_y2);
    // multivariate predictor array
    matrix[N, 2] Mu_gaussian;
    // multivariate predictor array
    matrix[N, 1] Mu_poisson;
    vector[2] sigma = transpose([sigma_y1, sigma_y2]);
    Mu_gaussian[ : , 1] = mu_y1;
    Mu_gaussian[ : , 2] = mu_y2;
    Mu_poisson[ : , 1] = mu_c;
    target += centered_gaussian_copula_cholesky_({normal_marginal(Y_gaussian, Mu_gaussian, sigma),poisson_marginal(Y_poisson, Mu_poisson, URaw_poisson)} , Lrescor, Outcome_Order);
  }
  // priors including constants
  target += lprior;
}
generated quantities {
  // actual population-level intercept
  real b_y1_Intercept = Intercept_y1;
  // actual population-level intercept
  real b_c_Intercept = Intercept_c;
  // actual population-level intercept
  real b_y2_Intercept = Intercept_y2;
  // residual correlations
  corr_matrix[nresp] Rescor = multiply_lower_tri_self_transpose(Lrescor);
  vector<lower=-1,upper=1>[nrescor] rescor;
  // extract upper diagonal of correlation matrix
  for (k in 1:nresp) {
    for (j in 1:(k - 1)) {
      rescor[choose(k - 1, 2) + j] = Rescor[j, k];
    }
  }
}
andrjohns commented 2 years ago

At the moment it's only hard-coded for gaussian, poisson, bernoulli, and binomial marginals, but is compatible with any distribution that has a CDF defined - so these marginal functions could just be generated in the future

andrjohns commented 2 years ago

But also, this is more of a proof-of-concept at the moment (I just needed the copulas for a work project), so I'm not committed to any particular implementation

andrjohns commented 2 years ago

As better reprex, here's an example of the implementation accurately recovering the correlations and generating parameters for a model with gaussian, poisson, and binomial outcomes:

setwd("~/brms")
devtools::load_all()
#> i Loading brms
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.16.11). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').

set.seed(2022)

cor <- round(randcorr::randcorr(4), 1) # Rounding for clarity

# Generate correlated standard-normal variates
n_out <- mnormt::rmnorm(n = 500, mean = c(0, 0, 0, 0),
                       varcov = cor)

# Put on uniform [0, 1] scale using standard normal CDF (Phi function)
p_out <- pnorm(n_out)

# Generate denominators for binomial
d <- sample(10:100, 500, replace = T)

# Transform uniform variates to desired distributions using
# quantile function with desired parameters
bdata <- data.frame(
  y1 = qnorm(p_out[,1], 10, 4),
  y2 = qnorm(p_out[,2], 3, 6),
  c = qpois(p_out[,3], 15),
  n = qbinom(p_out[,4], prob = 0.3, size = d),
  d = d
)

colnames(cor) <- c("y1","y2","c","n")
rownames(cor) <- c("y1","y2","c","n")

mn <- bf(y1 ~ 1, family = gaussian())
mp <- bf(c ~ 1, family = poisson())
mn2 <- bf(y2 ~ 1, family = gaussian())
mb <- bf(n | trials(d) ~ 1, family = binomial())
mod_n <- mn + mp + mn2 + mb + set_rescor(rescor = TRUE, copula = "gaussian")

t_n <- brm(mod_n, data = bdata, cores = 4)
#> Compiling Stan program...
#> Start sampling
summary(t_n)
#>  Family: MV(gaussian, poisson, gaussian, binomial) 
#>   Links: mu = identity; sigma = identity
#>          mu = log
#>          mu = identity; sigma = identity
#>          mu = logit 
#> Formula: y1 ~ 1 
#>          c ~ 1 
#>          y2 ~ 1 
#>          n | trials(d) ~ 1 
#>    Data: bdata (Number of observations: 500) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Population-Level Effects: 
#>              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> y1_Intercept     9.98      0.17     9.66    10.31 1.00     5207     3102
#> c_Intercept      2.70      0.01     2.67     2.72 1.00     4775     3203
#> y2_Intercept     3.35      0.25     2.86     3.83 1.00     5909     2793
#> n_Intercept     -0.85      0.01    -0.87    -0.82 1.00     8587     3695
#> 
#> Family Specific Parameters: 
#>          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sigma_y1     3.99      0.11     3.78     4.22 1.00     7320     3359
#> sigma_y2     5.65      0.16     5.35     5.97 1.00     5606     2976
#> 
#> Residual Correlations: 
#>               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> rescor(y1,c)      0.67      0.02     0.62     0.71 1.00     6889     2900
#> rescor(y1,y2)    -0.45      0.03    -0.51    -0.40 1.00     5202     3053
#> rescor(c,y2)     -0.70      0.01    -0.72    -0.67 1.00     4789     3717
#> rescor(y1,n)      0.52      0.03     0.46     0.57 1.00     6279     3204
#> rescor(c,n)       0.52      0.03     0.46     0.58 1.00     5366     3266
#> rescor(y2,n)      0.18      0.03     0.12     0.24 1.00     5276     3350
#> 
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
cor
#>      y1   y2    c   n
#> y1  1.0 -0.5  0.7 0.5
#> y2 -0.5  1.0 -0.7 0.2
#> c   0.7 -0.7  1.0 0.5
#> n   0.5  0.2  0.5 1.0

Created on 2022-03-14 by the reprex package (v2.0.1)

paul-buerkner commented 2 years ago

This is a really nice and simple interface. I like it a lot! No objections :-)

paul-buerkner commented 2 years ago

Just as a heads-up for you. In brms 3.x, which I will work on this year, my plan is to enable more user control over residual correlations. That is, to allow some (blocks of) variables to be correlated while others (blocks of) variables remain uncorrelated. I still have to figure out the interface exactly but it would tentatively extend set_rescor too. Suggestions for this interface are of course also welcome.

andrjohns commented 2 years ago

That's a great idea! I'll keep that in mind during the changes so I'm not doing anything too dramatic (or if I have any suggestions!)

bachlaw commented 2 years ago

At the moment it's only hard-coded for gaussian, poisson, bernoulli, and binomial marginals, but is compatible with any distribution that has a CDF defined - so these marginal functions could just be generated in the future

Andrew, could this structure be used to create a sort of approximation to correlated multinomial outcomes, along the lines of a multinomial probit, or perhaps something more functional with the same result?

andrjohns commented 2 years ago

could this structure be used to create a sort of approximation to correlated multinomial outcomes

Not an approximation, but exactly that. Gaussian copulas allow you to model the correlation between outcomes separately from the distributions of those outcomes, so you can explicitly model a correlated multinomial.

However, Gaussian copulas require the CDF for the outcome type, and we don't have a multinomial_cdf function. If you have a Stan-language implementation then I could add that to brms and the family would then be compatible.

bachlaw commented 2 years ago

Indeed: I noticed that in Stan there generally don't seem to be CDFs for discrete distributions other than binary outcomes. Perhaps there is a comparable implementation in scipy or a similar resource we could borrow? Another option might be to borrow from the Poisson CDF given the connection between the Poisson and the multinomial, although I would need to think that through.

andrjohns commented 2 years ago

It's mostly just a time issue, as development tends to be driven by user demand and developer interest. We have an open issue in the Math library for missing distribution functions: https://github.com/stan-dev/math/issues/2657, I'd recommend commenting in there so it gets added to the radar.

Additionally, if there is a reference implementation or a summary of the math for the needed function that you can provide, then that also saves a great deal of time

bachlaw commented 2 years ago

Additionally, if there is a reference implementation or a summary of the math for the needed function that you can provide, then that also saves a great deal of time

Found a few and linked to them in that math issue.

andrjohns commented 2 years ago

Quick progress update for anyone that's interested.

I've just about finished configuring the implementation to automatically generate the Stan code for the specified marginal distributions, so that it can be automatically compatible with any distribution that has a CDF, rather than hard-coding for a handful of them.

This has identified some issues with numerical stability, as the CDF functions can underflow/overflow relatively easily (as I discovered with some binomial outcomes). The natural resolution is to instead call the log-CDF (noting that the Stan Math implementation might sometimes be just calling the CDF and taking the log, which is another issue). However, the estimation requires passing the result to the inv_Phi function, so we need an implementation of the inv_Phi that takes inputs on the log scale. I've implemented this in pure Stan here, with the aim to have it implemented in the Math library eventually

spinkney commented 2 years ago

Cool! Is there a reference for that implementation?

andrjohns commented 2 years ago

Its actually just the Stan Math implementation but translated to the log scale, and I can see from the history that you've already been involved in that one, so you probably have a better idea than me on the reference!

Note that this will also be much faster once it's in c++ - since the autodiff is currently calculating gradients for each of the intermediate log_sum_exp calls

jokroese commented 12 months ago

This implementation looks great, great work @andrjohns!

Is there anything we can do to help get the branch implementing this merged into brms? I would love to be using and building on this feature, even with the issues identified in https://github.com/paul-buerkner/brms/issues/1317#issuecomment-1072986536.