rok-cesnovar / math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Issue with stan math functions #6

Open spinkney opened 4 years ago

spinkney commented 4 years ago

I was double checking the code and I only found this issue because my local machine has a newer version of this vs the production machine I'm using at work.

I've isolated the issue and it's very perplexing to me as this should be the same. However, the answer is wrong.

// currently we have 
// L is lower-triangular
// J = (L^-1)^T
// x = (J * J^T) * g0 = (L^-1)^T * (L^-1) * g0 = G^-1 * g0
auto L = stan::math::cholesky_decompose(G);
auto J = stan::math::transpose(stan::math::mdivide_left_tri_low(L));
auto x = multiply(stan::math::multiply_lower_tri_self_transpose(J), g0);

// vs just computing the inverse
// auto x = multiply(inverse(G), g0);

I've attached an example where the output should be 0.7, 0.3 and I've confirmed this with the R package quadprogpp and the old version using multiply(inverse(G), g0);. The "new" wrong version gives 1 in variable places for this test.

It's easy for us to revert back to using the inverse solution but I hope this isn't a bigger bug in Stan math so I'm bringing it up here.

K <- 8
J <- 20
true_v <- c(0.2, 0.5, 0.3, rep(0, K - 3))
true_w <- c(0.7, 0.3, rep(0, J - 2))
X0 <- matrix(rnorm(K * J, 0, sd = 1), J, K)
X1 <- apply(X0, 2, function(x) t(true_w) %*% (x ))
beta <- true_v * 10 + rnorm(K, 0, 0.1)
Y0 <- X0 %*% beta + rnorm(20, 0, 0.1)
Y1 <- beta %*% X1 + rnorm(1, 0, 0.1)
data =  list(
  K = K,
  J = J,
  X1 = matrix(X1, K, 1),
  X0 = t(X0),
  Y0 = as.vector(Y0),
  Y1 = c(Y1),
  CE = t(t(rep(1, J))),
  ce0 = array(-1),
  CI =  cbind(diag(J), -diag(J)),
  ci0 =c(rep(0, J), rep(1, J)),
  v_prior = abs(beta)
)
K <- K
J <- J
X1 <- matrix(X1, K, 1)
X_qp <- t(X0)
Y0 <- as.vector(Y0)
Y1 <- c(Y1)
CE <- t(t(rep(1, J)))
ce0 <- array(-1)
CI <-  cbind(diag(J), -diag(J))
ci0 <-c(rep(0, J), rep(1, J))
v_prior <- abs(beta)
G <- t(X_qp) %*% diag(v_prior/sum(v_prior)) %*% X_qp
g0 <- -1 * c(t(X1) %*% diag(v_prior/sum(v_prior)) %*% X_qp )
library(cmdstanr)
set_cmdstan_path("~/cmdstan")
set.seed(12345)
mod <- cmdstan_model("~/cmdstan/test_for_rok.stan")
# mod <- cmdstan_model("~/cmdstan/quadprog_test.stan")

fit <- mod$sample(
  data = data,
  seed = 987083,
  chains = 2,
  parallel_chains = 2,
  iter_warmup = 200,
  iter_sampling = 200,
  max_treedepth = 10,
  adapt_delta = 0.8
)

test_for_rok.stan.zip

rok-cesnovar commented 4 years ago

This would be better to check directly in C++. Can you give me an example of G and g0 for which this does not match? I can then make simple unit tests.

spinkney commented 4 years ago

You can use from the R script above. We don't even need the v thing.

G <- t(X_qp) %*% X_qp
g0 <- -1 * c(t(X1) %*% X_qp )
spinkney commented 3 years ago

I pushed updates that fixed issues but I couldn't do what I originally wanted which should be mathematically correct.

I wanted to use the Cholesky factor L and take the inverse then do multiply lower tri self transpose to get the inverse of G. In the C++ code it is

auto L = stan::math::cholesky_decompose(G);
auto J = stan::math::transpose(stan::math::mdivide_left_tri_low(L));
auto x = multiply(stan::math::multiply_lower_tri_self_transpose(J), g0)

this does not work! Even though it should, see

PD_mat <-  matrix(
              c(1.00, 0.25, 0.10, 0.40,
              0.25, 1.00, 0.80, 0.00, 
              0.10, 0.80, 1.00, -0.05,
              0.40, 0.00, -0.05,  1.00), 
              nrow = 4, ncol = 4, byrow = T)
J <- chol(G_test) # R returns upper tri so don't need transpose like in Eigen
all.equal(solve(G_test), tcrossprod(solve(J)))
[1] TRUE

What I ended up doing in solve_quadprog is to use G in this call

auto x stan::math::mdivide_left_spd(G, g0);

In the chol version, I do the same since earlier I have to construct G to get the trace but at least we don't have to cholesky decompose again.

Here's a nice testing script.

K <- 8
J <- 20
true_w <- c(0.7, 0.3, rep(0, J - 2))
X0 <- matrix(rnorm(K * J, 0, sd = 1), J, K)
X1 <- apply(X0, 2, function(x) t(true_w) %*% (x ))
beta <- true_v * 10 + rnorm(K, 0, 0.1)
Y0 <- X0 %*% beta + rnorm(20, 0, 0.1)
Y1 <- beta %*% X1 + rnorm(1, 0, 0.1)

K <- K
J <- J
X1 <- matrix(X1, K, 1)
X_qp <- t(X0)
Y0 <- as.vector(Y0)
Y1 <- c(Y1)
CE <- t(t(rep(1, J)))
ce0 <- array(-1)
CI <-  cbind(diag(J), -diag(J))
ci0 <-c(rep(0, J), rep(1, J))

G <- t(X_qp) %*% X_qp
g0 <- -1 * c(t(X1) %*% X_qp )

data =  list(
  K = K,
  J = J,
  CE = CE,
  ce0 =  ce0,
  CI =  CI,
  ci0 = ci0,
  G = G + diag(.000001, J),
  g0 = g0
)
library(cmdstanr)
set_cmdstan_path("solve_qp/cmdstan")
set.seed(12345)
mod <- cmdstan_model("solve_qp/test_for_rok.stan", force_recompile = TRUE)
fit <- mod$sample(
  data = data,
  chains = 2,
  parallel_chains = 2,
  iter_warmup = 200,
  iter_sampling = 200,
  max_treedepth = 10,
  adapt_delta = 0.8
)

fit$summary("w_optim")$mean
fit$summary("w_chol")$mean
# should return 0.7 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0

test_for_rok.stan

data {
    int<lower=1> K;             // num predictors
    int<lower=0> J;             // num countries
    matrix[J, 1] CE;            // CE for predictor country
    vector[1] ce0;              // CE for target country
    matrix[J, 2 * J] CI;        // CI for predictor country
    vector[2 * J] ci0;          // CI for target country
    matrix[J, J] G;
    vector[J] g0;
}
parameters {
    real<lower=0> sigma; 
}
transformed parameters {
    vector[J] w_optim = round(100 * solve_quadprog(G, g0, CE, ce0, CI, ci0))/100;
    vector[J] w_chol = round(100 * solve_quadprog_chol(cholesky_decompose(G), g0, CE, ce0, CI, ci0))/100;
}
model {
  sigma ~ std_normal();
}