rok-cesnovar / math

The Stan Math Library is a C++ template library for automatic differentiation of any order using forward, reverse, and mixed modes. It includes a range of built-in functions for probabilistic modeling, linear algebra, and equation solving.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

Ability to use cholseky factor L in place of G #2

Open spinkney opened 4 years ago

spinkney commented 4 years ago

Hi Rok, I've had some more time to play with the code and one thing that crops up is that G will sometimes get rejected for not being positive definite, even though it is. I believe it's just a numerical precision issue.

I resorted to rewriting the model in terms of cholesky factor corr objects. I have to repack it into a N x N matrix G before calling the solver which then goes ahead and refactorizes it back into LL'. Anyway it would be nice to use L directly in place of G if one already has it to save on the recalcs and numerical issues.

rok-cesnovar commented 4 years ago

Yeah the symmetric check in our cholesky decompose might be too strict. Need to check.

But yes, we can make another signature, thats easy. If you can post me a model I can use to test that would be very helpful. I will take a look tomorrow. Done thinking for tonight :)

spinkney commented 4 years ago

I'll work on getting you a simulated model in a few days. Right now the gist is...

I'm using a construction similar to the multivariate priors for hierarchical models code in the manual. The construction is such that a positive diagonal matrix tau is multiplied by a cholesky factor correlation matrix so the product is also PD.

I think it should be possible to substitute for the G matrix an L as a lower triangular matrix which, in my case, would be diag_pre_multiply(tau, L_Omega). Then the cholesky_decompose step could be omitted within quad_prog.

X is M x N and should be PD it is constructed as X = z * diag_pre_multiply(tau, L_Omega)'

parameters {
matrix[M, N] z;
vector<lower=0>[N] tau;
cholesky_factor_corr[N] L_Omega;
}
transformed parameters {
matrix [M, N] X = z * diag_pre_multiply(tau, L_Omega)';

// I can directly assign G = tcrossprod(X) or
matrix[N, N] G = tcrossprod(diag_pre_multiply(tau, L_Omega));
}
rok-cesnovar commented 4 years ago

Great!

I dont need a full tested model right now, just an example model on how you want to call it so I can get it to compile.

spinkney commented 4 years ago
// named test_for_rok.stan
data {
  int<lower=1> K;             // num predictors
  int<lower=0> J;                 
  matrix[1, K] X1;
  matrix[K, J] X0;
  matrix[J, 1] CE;      
  vector[1] ce0;         
  matrix[J, 2 * J] CI;   
  vector[2 * J] ci0;    
}
transformed data{
  vector[J] epsilon = rep_vector(1e-12, J);
}
parameters {
  cholesky_factor_corr[J] L;
  vector<lower=0>[J] tau;
  matrix[J, K] z;
}
transformed parameters {
  vector[J] weight;
  matrix[K, J] X0_est = (diag_pre_multiply(tau, L) * z)';
  vector[J] g0 = to_vector(-1 * (X1 * X0_est));
  matrix[J, J] G = tcrossprod(diag_pre_multiply(tau, L));
  weight = solve_quadprog(G, g0, CE, ce0, CI, ci0);
}
model {
  L ~ lkj_corr_cholesky(2);
  to_vector(z) ~ std_normal();
  tau ~ std_normal();
  to_vector(X0) ~ normal(to_vector(X0_est), 0.3);
}
generated quantities{
  vector[K] X1_est = X0 * weight;
  vector[K] X1_est_prime = X0_est * weight;
}
set.seed(12345)

K <- 8
J <- 20
true_w <- c(0.7, 0.3, rep(0, J - 2))

X0 <- matrix(rnorm(K * J, 0, sd = 1), J, K)
X1 <- apply(X0, 2, function(x) t(true_w) %*% (x ))
X1 <- X1 + rnorm(K, 0, 0.01)

K = K
J = J
X1 = X1
X0 = X0
CE = t(t(rep(1, J)))
ce0 = array(-1)
CI =  cbind(diag(J), -diag(J))
ci0 =c(rep(0, J), rep(1, J))

# data preparation

stan_rdump(c('K', 'J', 'X1', 'X0', 'CE', 
             'ce0', 'CI', 'ci0'), file = "quadprog_data.R")

file <- file.path(".\\test_for_rok.stan")
mod <- cmdstan_model(file)
fit <- mod$sample(
  data =  list(
    K = K,
    J = J,
    X1 = t(X1),
    X0 = t(X0),
    CE = t(t(rep(1, J))),
    ce0 = array(-1),
    CI =  cbind(diag(J), -diag(J)),
    ci0 =c(rep(0, J), rep(1, J))
  ),
  seed = 987083,
  num_chains = 2,
  num_cores = 2,
  num_warmup = 200,
  num_samples = 200,
  max_depth = 12,
  adapt_delta = 0.8
)
rok-cesnovar commented 4 years ago

Thanks. Will look into this tonight.

rok-cesnovar commented 4 years ago

Sorry, only had time to come to this tonight. While the model compiles, the script fails, because Y0 is undefined.

In the model you are still using G as the first argument? Do I replace it with X0_est, which is (diag_pre_multiply(tau, L) * z)'; Not really sure.

spinkney commented 4 years ago

Oops 'Y0' and 'Y1' are from other tests that aren't needed. You can comment them out.

For the new 'G', sort of but 'tcrossprod(X0_est)' or taking in both 'L' and 'tau' and 'z' if that makes more sense. Just so that the cholesky decomp is skipped. Sorry, wanted to update you quickly but I'm outside with the kids or else I would double check within the solve quadprog code.

spinkney commented 4 years ago

Checking it out, all we need is some valid cholesky L. In the code above a valid L is either diag_pre_multiply(tau, L) or L. In terms of the "correct" L, for the program, I want diag_pre_multiply(tau, L).

matrix G or lower_tri_matrix L (where I've supplied diag_pre_multiply(tau, L))

Since Stan does not have a lower_tri_matrix type maybe we just add a boolean flag for chol_type = 1 if T and 0 if F, default is F? Though eigen does have aisLowerTriangular() argument, maybe you can use that?

I've made changes to the solve_quadprog.hpp to show what I think should be changed. In the code,

265 template <typename T0, typename T1, typename T2, typename T3,
266           typename T4, typename T5>
267 auto
268 solve_quadprog(const Eigen::Matrix<T0, -1, -1> &G,
269                const Eigen::Matrix<T1, -1, 1> &g0,
270                const Eigen::Matrix<T2, -1, -1> &CE,
271                const Eigen::Matrix<T3, -1, 1> &ce0,
272                const Eigen::Matrix<T4, -1, -1> &CI,
273                const Eigen::Matrix<T5, -1, 1> &ci0)
...
302   /* compute the trace of the original matrix G */
303  if (chol_type == 1) {
304    auto chol = L;
305    auto G = stan::math::tcrossprod(L);
306  } else {
307   /* decompose the matrix G in the form LL^T */
308    auto chol = stan::math::cholesky_decompose(G);
309     }
310 
311   auto c1 = G.trace();
...
336  if (chol_type == 1) {
337    /* G^-1 = (L^-1)^T (L^-1) 
338     * https://forum.kde.org/viewtopic.php?f=74&t=127426
339     * */
340     auto Ginv = cross((L.triangularView<Lower>().solve(Identity()));
341     auto x = multiply(Ginv, g0);
342   else {
343     auto x = multiply(inverse(G), g0);
344   }
345     x = -x;
spinkney commented 4 years ago

I did not change the inputs to reflect taking in L. Not sure how you write a new template or method in eigen.

rok-cesnovar commented 4 years ago

Oh, that would just mean another signature solve_quadprog_chol, which takes in L. Would that work for you?

rok-cesnovar commented 4 years ago

We can check if the matrix is lower triangular, but not sure that would cover all our cases.

spinkney commented 4 years ago

That works! Double check anything I wrote in eigen because I'm a total noob with it!

spinkney commented 4 years ago

When this gets added is it also possible to get the stan updates on this branch? I'm testing with reduce_sum and the arguments changed from this branch to the recent release.

rok-cesnovar commented 4 years ago

Yes, definitely.

rok-cesnovar commented 4 years ago

Ok, got this to compile. Updated the cmdstan branch. Also up to date with recent cmdstan.

Make sure to run:

make clean-all make stan-update make build

spinkney commented 4 years ago

Hi Rok, thanks for updating! I think there's a few modifications to make to make this faster.

So you can see the original first and then my updates after.

change this from

 /*
   * Preprocessing phase
   */

  /* compute the trace of the original matrix G */
  auto chol = L;
  auto G = tcrossprod(L);

  auto c1 = G.trace();

  /* initialize the matrix R */
  d.setZero();

  Eigen::Matrix<T0, -1, -1> R(G.rows(), G.cols());
  R.setZero();
  T_return R_norm = 1.0; /* this variable will hold the norm of the matrix R */

  /* compute the inverse of the factorized matrix G^-1, this is the initial
   * value for H */
  //J = L^-T
  auto J = stan::math::transpose(stan::math::mdivide_left_tri_low(chol));
  auto c2 = J.trace();
#ifdef TRACE_SOLVER
  print_matrix("J", J, n);
#endif

  /* c1 * c2 is an estimate for cond(G) */

  /*
   * Find the unconstrained minimizer of the quadratic form 0.5 * x G x + g0 x
   * this is a feasible point in the dual space
   * x = G^-1 * g0
   */

  /* G^-1 = (L^-1)^T (L^-1) 
   * https://forum.kde.org/viewtopic.php?f=74&t=127426
   * */
  auto Ginv = crossprod(mdivide_left_tri_low(L));
  auto x = multiply(Ginv, g0);

to

 /*
   * Preprocessing phase
   */

  /* compute the trace of the original matrix G */
/* can remove this  auto chol = L; 
* can provide the trace already so don't need
* to compute G again 
*  auto G = tcrossprod(L);
*  auto c1 = G.trace(); */

/* tr(AB^T) = sum( A hadamard_prod B) 
* https://en.wikipedia.org/wiki/Trace_(linear_algebra)
*/ 
auto c1 = sum(elt_multiply(L.triangularView<Eigen::Lower>(), L.triangularView<Eigen::Lower>()));

  /* initialize the matrix R */
  d.setZero();

  /*  compute the trace of the original matrix G */
/* **CHANGE HERE** G to L */
  Eigen::Matrix<T0, -1, -1> R(L.rows(), L.cols());
  R.setZero();
  T_return R_norm = 1.0; /* this variable will hold the norm of the matrix R */

  /* compute the inverse of the factorized matrix G^-1, this is the initial
   * value for H */
  //J = L^-T
/* **CHANGE HERE** 
* J is only called once and trace is the same regardless of transpose 
* also change chol to L */
  auto J = stan::math::mdivide_left_tri_low(L);
  auto c2 = J.trace();
#ifdef TRACE_SOLVER
  print_matrix("J", J, n);
#endif

  /* c1 * c2 is an estimate for cond(G) */

  /*
   * Find the unconstrained minimizer of the quadratic form 0.5 * x G x + g0 x
   * this is a feasible point in the dual space
   * x = G^-1 * g0
   */

  /* G^-1 = (L^-1)^T (L^-1) 
   * https://forum.kde.org/viewtopic.php?f=74&t=127426
   * */
/* **CHANGE HERE** 
* can use J here so don't need to compute inverse again */
  auto Ginv = crossprod(J);
  auto x = multiply(Ginv, g0);
spinkney commented 4 years ago

There's a few more optimizations to take knowing that L and J are triangular. See https://stackoverflow.com/questions/44319212/eigen-how-to-only-compute-the-lower-upper-part-in-the-matrix-inner-product

For Ginv we can do

Ginv.setZero();
Ginv.selfadjointView<Lower>().rankUpdate(J.transpose());
rok-cesnovar commented 4 years ago

Thanks. Added those and pushed to the branch.

spinkney commented 4 years ago

Edited, This doesn't work on windows 10 or linux. I get this error

> file <- file.path("~/Sean/test_for_rok_chol.stan")
> mod <- cmdstan_model(file)
Compiling Stan program...
In file included from stan/lib/stan_math/stan/math/prim/fun.hpp:82,
                 from stan/lib/stan_math/stan/math/prim.hpp:10,
                 from stan/lib/stan_math/stan/math/rev.hpp:9,
                 from stan/lib/stan_math/stan/math.hpp:19,
                 from stan/src/stan/model/model_header.hpp:4,
                 from /tmp/RtmpGXibqW/model-ab575a781a46.hpp:3:
stan/lib/stan_math/stan/math/prim/fun/elt_multiply.hpp: In instantiation of ‘auto stan::math::elt_multiply(const Mat1&, const Mat2&) [with Mat1 = Eigen::TriangularView<const Eigen::Matrix<double, -1, -1>, 1>; Mat2 = Eigen::TriangularView<const Eigen::Matrix<double, -1, -1>, 1>; <template-parameter-1-3> = void]’:
stan/lib/stan_math/stan/math/rev/fun/solve_quadprog.hpp:639:29:   required from ‘auto stan::math::solve_quadprog_chol(const Eigen::Matrix<LhsScalar, -1, -1, 0>&, const Eigen::Matrix<RhsScalar, -1, 1>&, const Eigen::Matrix<LhsScalar, -1, -1, 0>&, const Eigen::Matrix<T_job_param, -1, 1>&, const Eigen::Matrix<T_beta_scalar, -1, -1>&, const Eigen::Matrix<T_m0, -1, 1>&) [with T0 = double; T1 = double; T2 = double; T3 = double; T4 = double; T5 = double]’
/tmp/RtmpGXibqW/model-ab575a781a46.hpp:658:52:   required from here
stan/lib/stan_math/stan/math/prim/fun/elt_multiply.hpp:26:13: error: ‘const class Eigen::TriangularView<const Eigen::Matrix<double, -1, -1>, 1>’ has no member named ‘cwiseProduct’
   26 |   return m1.cwiseProduct(m2).eval();
      |          ~~~^~~~~~~~~~~~
In file included from stan/lib/stan_math/stan/math/rev/fun.hpp:126,
                 from stan/lib/stan_math/stan/math/rev.hpp:11,
                 from stan/lib/stan_math/stan/math.hpp:19,
                 from stan/src/stan/model/model_header.hpp:4,
                 from /tmp/RtmpGXibqW/model-ab575a781a46.hpp:3:
stan/lib/stan_math/stan/math/rev/fun/solve_quadprog.hpp: In instantiation of ‘auto stan::math::solve_quadprog_chol(const Eigen::Matrix<LhsScalar, -1, -1, 0>&, const Eigen::Matrix<RhsScalar, -1, 1>&, const Eigen::Matrix<LhsScalar, -1, -1, 0>&, const Eigen::Matrix<T_job_param, -1, 1>&, const Eigen::Matrix<T_beta_scalar, -1, -1>&, const Eigen::Matrix<T_m0, -1, 1>&) [with T0 = double; T1 = double; T2 = double; T3 = double; T4 = double; T5 = double]’:
/tmp/RtmpGXibqW/model-ab575a781a46.hpp:658:52:   required from here
stan/lib/stan_math/stan/math/rev/fun/solve_quadprog.hpp:639:16: error: ‘sum’ cannot be used as a function
  639 |   auto c1 = sum(elt_multiply(L.template triangularView<Eigen::Lower>(), L.template triangularView<Eigen::Lower>()));
      |             ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stan/lib/stan_math/stan/math/rev/fun/solve_quadprog.hpp:676:3: error: passing ‘const Eigen::Matrix<double, -1, -1>’ as ‘this’ argument discards qualifiers [-fpermissive]
  676 |   Ginv.setZero();
      |   ^~~~
In file included from stan/lib/stan_math/lib/eigen_3.3.7/Eigen/Core:463,
                 from stan/lib/stan_math/lib/eigen_3.3.7/Eigen/Dense:1,
                 from stan/lib/stan_math/stan/math/prim/fun/Eigen.hpp:22,
                 from stan/lib/stan_math/stan/math/rev.hpp:4,
                 from stan/lib/stan_math/stan/math.hpp:19,
                 from stan/src/stan/model/model_header.hpp:4,
                 from /tmp/RtmpGXibqW/model-ab575a781a46.hpp:3:
stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/CwiseNullaryOp.h:499:48: note:   in call to ‘Derived& Eigen::DenseBase<Derived>::setZero() [with Derived = Eigen::Matrix<double, -1, -1>]’
  499 | EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& DenseBase<Derived>::setZero()
      |                                                ^~~~~~~~~~~~~~~~~~
In file included from stan/lib/stan_math/stan/math/rev/fun.hpp:126,
                 from stan/lib/stan_math/stan/math/rev.hpp:11,
                 from stan/lib/stan_math/stan/math.hpp:19,
                 from stan/src/stan/model/model_header.hpp:4,
                 from /tmp/RtmpGXibqW/model-ab575a781a46.hpp:3:
stan/lib/stan_math/stan/math/rev/fun/solve_quadprog.hpp:677:3: error: no matching function for call to ‘Eigen::SelfAdjointView<const Eigen::Matrix<double, -1, -1>, 1>::rankUpdate(Eigen::Transpose<Eigen::Matrix<double, -1, -1> >) const’
  677 |   Ginv.template selfadjointView<Eigen::Lower>().rankUpdate(J.transpose());
      |   ^~~~
In file included from stan/lib/stan_math/lib/eigen_3.3.7/Eigen/Core:490,
                 from stan/lib/stan_math/lib/eigen_3.3.7/Eigen/Dense:1,
                 from stan/lib/stan_math/stan/math/prim/fun/Eigen.hpp:22,
                 from stan/lib/stan_math/stan/math/rev.hpp:4,
                 from stan/lib/stan_math/stan/math.hpp:19,
                 from stan/src/stan/model/model_header.hpp:4,
                 from /tmp/RtmpGXibqW/model-ab575a781a46.hpp:3:
stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/SelfAdjointView.h:154:22: note: candidate: ‘template<class DerivedU, class DerivedV> Eigen::SelfAdjointView<MatrixType, UpLo>& Eigen::SelfAdjointView<MatrixType, Mode>::rankUpdate(const Eigen::MatrixBase<OtherDerived>&, const Eigen::MatrixBase<OtherDerived>&, const Scalar&) [with DerivedU = DerivedU; DerivedV = DerivedV; _MatrixType = const Eigen::Matrix<double, -1, -1>; unsigned int UpLo = 1]’
  154 |     SelfAdjointView& rankUpdate(const MatrixBase<DerivedU>& u, const MatrixBase<DerivedV>& v, const Scalar& alpha = Scalar(1));
      |                      ^~~~~~~~~~
stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/SelfAdjointView.h:154:22: note:   template argument deduction/substitution failed:
In file included from stan/lib/stan_math/stan/math/rev/fun.hpp:126,
                 from stan/lib/stan_math/stan/math/rev.hpp:11,
                 from stan/lib/stan_math/stan/math.hpp:19,
                 from stan/src/stan/model/model_header.hpp:4,
                 from /tmp/RtmpGXibqW/model-ab575a781a46.hpp:3:
stan/lib/stan_math/stan/math/rev/fun/solve_quadprog.hpp:677:3: note:   candidate expects 3 arguments, 1 provided
  677 |   Ginv.template selfadjointView<Eigen::Lower>().rankUpdate(J.transpose());
      |   ^~~~
In file included from stan/lib/stan_math/lib/eigen_3.3.7/Eigen/Core:500,
                 from stan/lib/stan_math/lib/eigen_3.3.7/Eigen/Dense:1,
                 from stan/lib/stan_math/stan/math/prim/fun/Eigen.hpp:22,
                 from stan/lib/stan_math/stan/math/rev.hpp:4,
                 from stan/lib/stan_math/stan/math.hpp:19,
                 from stan/src/stan/model/model_header.hpp:4,
                 from /tmp/RtmpGXibqW/model-ab575a781a46.hpp:3:
stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/products/SelfadjointProduct.h:123:35: note: candidate: ‘Eigen::SelfAdjointView<MatrixType, UpLo>& Eigen::SelfAdjointView<MatrixType, Mode>::rankUpdate(const Eigen::MatrixBase<OtherDerived>&, const Scalar&) [with DerivedU = Eigen::Transpose<Eigen::Matrix<double, -1, -1> >; _MatrixType = const Eigen::Matrix<double, -1, -1>; unsigned int UpLo = 1; Eigen::SelfAdjointView<MatrixType, Mode>::Scalar = double]’ <near match>
  123 | SelfAdjointView<MatrixType,UpLo>& SelfAdjointView<MatrixType,UpLo>
      |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/products/SelfadjointProduct.h:123:35: note:   passing ‘Eigen::MatrixBase<Eigen::Matrix<double, -1, -1> >::ConstSelfAdjointViewReturnType<1>::Type*’ {aka ‘const Eigen::SelfAdjointView<const Eigen::Matrix<double, -1, -1>, 1>*’} as ‘this’ argument discards qualifiers
In file included from stan/lib/stan_math/stan/math/prim/fun.hpp:82,
                 from stan/lib/stan_math/stan/math/prim.hpp:10,
                 from stan/lib/stan_math/stan/math/rev.hpp:9,
                 from stan/lib/stan_math/stan/math.hpp:19,
                 from stan/src/stan/model/model_header.hpp:4,
                 from /tmp/RtmpGXibqW/model-ab575a781a46.hpp:3:
stan/lib/stan_math/stan/math/prim/fun/elt_multiply.hpp: In instantiation of ‘auto stan::math::elt_multiply(const Mat1&, const Mat2&) [with Mat1 = Eigen::TriangularView<const Eigen::Matrix<stan::math::var, -1, -1>, 1>; Mat2 = Eigen::TriangularView<const Eigen::Matrix<stan::math::var, -1, -1>, 1>; <template-parameter-1-3> = void]’:
stan/lib/stan_math/stan/math/rev/fun/solve_quadprog.hpp:639:29:   required from ‘auto stan::math::solve_quadprog_chol(const Eigen::Matrix<LhsScalar, -1, -1, 0>&, const Eigen::Matrix<RhsScalar, -1, 1>&, const Eigen::Matrix<LhsScalar, -1, -1, 0>&, const Eigen::Matrix<T_job_param, -1, 1>&, const Eigen::Matrix<T_beta_scalar, -1, -1>&, const Eigen::Matrix<T_m0, -1, 1>&) [with T0 = stan::math::var; T1 = stan::math::var; T2 = double; T3 = double; T4 = double; T5 = double]’
/tmp/RtmpGXibqW/model-ab575a781a46.hpp:492:28:   required from ‘T__ test_for_rok_chol_model_namespace::test_for_rok_chol_model::log_prob(std::vector<T_l>&, std::vector<int>&, std::ostream*) const [with bool propto__ = false; bool jacobian__ = false; T__ = stan::math::var; std::ostream = std::basic_ostream<char>]’
stan/src/stan/model/model_base_crtp.hpp:153:29:   required from ‘stan::math::var stan::model::model_base_crtp<M>::log_prob(std::vector<stan::math::var>&, std::vector<int>&, std::ostream*) const [with M = test_for_rok_chol_model_namespace::test_for_rok_chol_model; std::ostream = std::basic_ostream<char>]’
stan/src/stan/model/model_base_crtp.hpp:149:20:   required from here
stan/lib/stan_math/stan/math/prim/fun/elt_multiply.hpp:26:13: error: ‘const class Eigen::TriangularView<const Eigen::Matrix<stan::math::var, -1, -1>, 1>’ has no member named ‘cwiseProduct’
   26 |   return m1.cwiseProduct(m2).eval();
      |          ~~~^~~~~~~~~~~~
In file included from stan/lib/stan_math/stan/math/rev/fun.hpp:126,
                 from stan/lib/stan_math/stan/math/rev.hpp:11,
                 from stan/lib/stan_math/stan/math.hpp:19,
                 from stan/src/stan/model/model_header.hpp:4,
                 from /tmp/RtmpGXibqW/model-ab575a781a46.hpp:3:
stan/lib/stan_math/stan/math/rev/fun/solve_quadprog.hpp: In instantiation of ‘auto stan::math::solve_quadprog_chol(const Eigen::Matrix<LhsScalar, -1, -1, 0>&, const Eigen::Matrix<RhsScalar, -1, 1>&, const Eigen::Matrix<LhsScalar, -1, -1, 0>&, const Eigen::Matrix<T_job_param, -1, 1>&, const Eigen::Matrix<T_beta_scalar, -1, -1>&, const Eigen::Matrix<T_m0, -1, 1>&) [with T0 = stan::math::var; T1 = stan::math::var; T2 = double; T3 = double; T4 = double; T5 = double]’:
/tmp/RtmpGXibqW/model-ab575a781a46.hpp:492:28:   required from ‘T__ test_for_rok_chol_model_namespace::test_for_rok_chol_model::log_prob(std::vector<T_l>&, std::vector<int>&, std::ostream*) const [with bool propto__ = false; bool jacobian__ = false; T__ = stan::math::var; std::ostream = std::basic_ostream<char>]’
stan/src/stan/model/model_base_crtp.hpp:153:29:   required from ‘stan::math::var stan::model::model_base_crtp<M>::log_prob(std::vector<stan::math::var>&, std::vector<int>&, std::ostream*) const [with M = test_for_rok_chol_model_namespace::test_for_rok_chol_model; std::ostream = std::basic_ostream<char>]’
stan/src/stan/model/model_base_crtp.hpp:149:20:   required from here
stan/lib/stan_math/stan/math/rev/fun/solve_quadprog.hpp:639:16: error: invalid use of void expression
  639 |   auto c1 = sum(elt_multiply(L.template triangularView<Eigen::Lower>(), L.template triangularView<Eigen::Lower>()));
      |             ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stan/lib/stan_math/stan/math/rev/fun/solve_quadprog.hpp:676:3: error: passing ‘const Eigen::Matrix<stan::math::var, -1, -1>’ as ‘this’ argument discards qualifiers [-fpermissive]
  676 |   Ginv.setZero();
      |   ^~~~
In file included from stan/lib/stan_math/lib/eigen_3.3.7/Eigen/Core:463,
                 from stan/lib/stan_math/lib/eigen_3.3.7/Eigen/Dense:1,
                 from stan/lib/stan_math/stan/math/prim/fun/Eigen.hpp:22,
                 from stan/lib/stan_math/stan/math/rev.hpp:4,
                 from stan/lib/stan_math/stan/math.hpp:19,
                 from stan/src/stan/model/model_header.hpp:4,
                 from /tmp/RtmpGXibqW/model-ab575a781a46.hpp:3:
stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/CwiseNullaryOp.h:499:48: note:   in call to ‘Derived& Eigen::DenseBase<Derived>::setZero() [with Derived = Eigen::Matrix<stan::math::var, -1, -1>]’
  499 | EIGEN_DEVICE_FUNC EIGEN_STRONG_INLINE Derived& DenseBase<Derived>::setZero()
      |                                                ^~~~~~~~~~~~~~~~~~
In file included from stan/lib/stan_math/stan/math/rev/fun.hpp:126,
                 from stan/lib/stan_math/stan/math/rev.hpp:11,
                 from stan/lib/stan_math/stan/math.hpp:19,
                 from stan/src/stan/model/model_header.hpp:4,
                 from /tmp/RtmpGXibqW/model-ab575a781a46.hpp:3:
stan/lib/stan_math/stan/math/rev/fun/solve_quadprog.hpp:677:3: error: no matching function for call to ‘Eigen::SelfAdjointView<const Eigen::Matrix<stan::math::var, -1, -1>, 1>::rankUpdate(Eigen::Transpose<Eigen::Matrix<stan::math::var, -1, -1> >) const’
  677 |   Ginv.template selfadjointView<Eigen::Lower>().rankUpdate(J.transpose());
      |   ^~~~
In file included from stan/lib/stan_math/lib/eigen_3.3.7/Eigen/Core:490,
                 from stan/lib/stan_math/lib/eigen_3.3.7/Eigen/Dense:1,
                 from stan/lib/stan_math/stan/math/prim/fun/Eigen.hpp:22,
                 from stan/lib/stan_math/stan/math/rev.hpp:4,
                 from stan/lib/stan_math/stan/math.hpp:19,
                 from stan/src/stan/model/model_header.hpp:4,
                 from /tmp/RtmpGXibqW/model-ab575a781a46.hpp:3:
stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/SelfAdjointView.h:154:22: note: candidate: ‘template<class DerivedU, class DerivedV> Eigen::SelfAdjointView<MatrixType, UpLo>& Eigen::SelfAdjointView<MatrixType, Mode>::rankUpdate(const Eigen::MatrixBase<OtherDerived>&, const Eigen::MatrixBase<OtherDerived>&, const Scalar&) [with DerivedU = DerivedU; DerivedV = DerivedV; _MatrixType = const Eigen::Matrix<stan::math::var, -1, -1>; unsigned int UpLo = 1]’
  154 |     SelfAdjointView& rankUpdate(const MatrixBase<DerivedU>& u, const MatrixBase<DerivedV>& v, const Scalar& alpha = Scalar(1));
      |                      ^~~~~~~~~~
stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/SelfAdjointView.h:154:22: note:   template argument deduction/substitution failed:
In file included from stan/lib/stan_math/stan/math/rev/fun.hpp:126,
                 from stan/lib/stan_math/stan/math/rev.hpp:11,
                 from stan/lib/stan_math/stan/math.hpp:19,
                 from stan/src/stan/model/model_header.hpp:4,
                 from /tmp/RtmpGXibqW/model-ab575a781a46.hpp:3:
stan/lib/stan_math/stan/math/rev/fun/solve_quadprog.hpp:677:3: note:   candidate expects 3 arguments, 1 provided
  677 |   Ginv.template selfadjointView<Eigen::Lower>().rankUpdate(J.transpose());
      |   ^~~~
In file included from stan/lib/stan_math/lib/eigen_3.3.7/Eigen/Core:500,
                 from stan/lib/stan_math/lib/eigen_3.3.7/Eigen/Dense:1,
                 from stan/lib/stan_math/stan/math/prim/fun/Eigen.hpp:22,
                 from stan/lib/stan_math/stan/math/rev.hpp:4,
                 from stan/lib/stan_math/stan/math.hpp:19,
                 from stan/src/stan/model/model_header.hpp:4,
                 from /tmp/RtmpGXibqW/model-ab575a781a46.hpp:3:
stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/products/SelfadjointProduct.h:123:35: note: candidate: ‘Eigen::SelfAdjointView<MatrixType, UpLo>& Eigen::SelfAdjointView<MatrixType, Mode>::rankUpdate(const Eigen::MatrixBase<OtherDerived>&, const Scalar&) [with DerivedU = Eigen::Transpose<Eigen::Matrix<stan::math::var, -1, -1> >; _MatrixType = const Eigen::Matrix<stan::math::var, -1, -1>; unsigned int UpLo = 1; Eigen::SelfAdjointView<MatrixType, Mode>::Scalar = stan::math::var]’ <near match>
  123 | SelfAdjointView<MatrixType,UpLo>& SelfAdjointView<MatrixType,UpLo>
      |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/products/SelfadjointProduct.h:123:35: note:   passing ‘Eigen::MatrixBase<Eigen::Matrix<stan::math::var, -1, -1> >::ConstSelfAdjointViewReturnType<1>::Type*’ {aka ‘const Eigen::SelfAdjointView<const Eigen::Matrix<stan::math::var, -1, -1>, 1>*’} as ‘this’ argument discards qualifiers
make: *** [make/program:54: /tmp/RtmpGXibqW/model-ab575a781a46] Error 1
Error in processx::run(command = make_cmd(), args = c(tmp_exe, include_paths,  : 
  System command 'make' failed, exit status: 2, stderr (last 10 lines):
E>                  from stan/lib/stan_math/stan/math/prim/fun/Eigen.hpp:22,
E>                  from stan/lib/stan_math/stan/math/rev.hpp:4,
E>                  from stan/lib/stan_math/stan/math.hpp:19,
E>                  from stan/src/stan/model/model_header.hpp:4,
E>                  from /tmp/RtmpGXibqW/model-ab575a781a46.hpp:3:
E> stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/products/SelfadjointProduct.h:123:35: note: candidate: ‘Eigen::SelfAdjointView<MatrixType, UpLo>& Eigen::SelfAdjointView<MatrixType, Mode>::rankUpdate(const Eigen::MatrixBase<OtherDerived>&, const Scalar&) [with DerivedU = Eigen::Transpose<Eigen::Matrix<stan::math::var, -1, -1> >; _MatrixType = const Eigen::Matrix<stan::math::var, -1, -1>; unsigned int UpLo = 1; Eigen::SelfAdjointView<MatrixType, Mode>::Scalar = stan::math::var]’ <near match>
E>   123 | Self
[...]
Type .Last.error.trace to see where the error occ
spinkney commented 4 years ago

On windows 10 I'm getting

> file <- file.path(".\\test_for_rok_chol.stan")
> mod <- cmdstan_model(file)
Compiling Stan program...
\
Semantic error in 'C:/Users/spinkney/AppData/Local/Temp/RtmpWY87ME/model-f842d27840.stan', line 25, column 11 to column 55:
   -------------------------------------------------
    23:    vector[J] g0 = to_vector(-1 * (X1 * X0_est));
    24:    matrix[J, J] G = diag_pre_multiply(tau, L);
    25:    weight = solve_quadprog_chol(G, g0, CE, ce0, CI, ci0);
                    ^
    26:  }
    27:  model {
   -------------------------------------------------

A returning function was expected but an undeclared identifier 'solve_quadprog_chol' was supplied.

mingw32-make.exe: *** [C:/Users/spinkney/AppData/Local/Temp/RtmpWY87ME/model-f842d27840.hpp] Error 1
Error in processx::run(command = make_cmd(), args = c(tmp_exe, include_paths,  : 
  System command 'mingw32-make.exe' failed, exit status: 2, stderr (last 10 lines):
E>     24:    matrix[J, J] G = diag_pre_multiply(tau, L);
E>     25:    weight = solve_quadprog_chol(G, g0, CE, ce0, CI, ci0);
E>                     ^
E>     26:  }
E>     27:  model {
E>    -------------------------------------------------
E> 
E> A returning function was expected but an undeclared identifier 'solve_quadprog_chol' was supplied.
E> 
E> mingw32-make.exe: *** [C:/Users/spinkney/AppData/Local/Temp/RtmpWY87ME/model-f842d27840.hpp] Error 1
Type .Last.error.trace to see where the error occured
rok-cesnovar commented 4 years ago

Did you remove bin/stanc on windows? The latter looks like the stan compiler is not updated.

I tested on Ubuntu. Let me check I pushed all the submodules correctly.

spinkney commented 4 years ago

No...and the other model with the call to solve_quadprog() (not chol) works image

rok-cesnovar commented 4 years ago

try removing the stanc.exe and calling mingw32-make build again

spinkney commented 4 years ago

FYI, on my linux machine I'm running pop_os 20.04 (based on ubuntu 20.04)

rok-cesnovar commented 4 years ago

And you ran:

make clean-all
make stan-update
make build
spinkney commented 4 years ago

Yes on both...image

rok-cesnovar commented 4 years ago

Can you share the model you test this with so i make sure we are both compiling the same thing.

spinkney commented 4 years ago

saved as test_for_rok_chol.stan

data {
  int<lower=1> K;             // num predictors
  int<lower=0> J;                 
  matrix[1, K] X1;
  matrix[K, J] X0;
  matrix[J, 1] CE;      
  vector[1] ce0;         
  matrix[J, 2 * J] CI;   
  vector[2 * J] ci0;    
}
transformed data{
  vector[J] epsilon = rep_vector(1e-12, J);
}
parameters {
  cholesky_factor_corr[J] L;
  vector<lower=0>[J] tau;
  matrix[J, K] z;
}
transformed parameters {
  vector[J] weight;
  matrix[K, J] X0_est = (diag_pre_multiply(tau, L) * z)';
  vector[J] g0 = to_vector(-1 * (X1 * X0_est));
  matrix[J, J] G = diag_pre_multiply(tau, L);
  weight = solve_quadprog_chol(G, g0, CE, ce0, CI, ci0);
}
model {
  L ~ lkj_corr_cholesky(2);
  to_vector(z) ~ std_normal();
  tau ~ std_normal();
  to_vector(X0) ~ normal(to_vector(X0_est), 0.3);
}
generated quantities{
  vector[K] X1_est = X0 * weight;
  vector[K] X1_est_prime = X0_est * weight;
}
rok-cesnovar commented 4 years ago

Ok, that fails for me too. Will ping once fixed.

rok-cesnovar commented 4 years ago

Fixed now, tho the elt_multiply with lower triangular didn't want to work. That is not going to be a significant optimization so I removed it. The rest is in.

spinkney commented 4 years ago

On linux it compiles but I'm getting some error while running the model

Chain 2 test_for_rok_chol: stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/DenseCoeffsBase.h:425: Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator()(Eigen::Index) [with Derived = Eigen::Matrix<int, -1, 1>; Eigen::DenseCoeffsBase<Derived, 1>::Scalar = int; Eigen::Index = long int]: Assertion `index >= 0 && index < size()' failed.
Warning: Chain 2 finished unexpectedly!
spinkney commented 4 years ago

I'm still getting an error on windows even though I've done

make clean-all
git pull
make stan-update
mingw32-make build
> mod <- cmdstan_model(file)
Compiling Stan program...
\
Semantic error in 'C:/Users/spinkney/AppData/Local/Temp/RtmpsRyQSw/model-4ccc3fdb3083.stan', line 24, column 11 to column 55:
   -------------------------------------------------
    22:    vector[J] g0 = to_vector(-1 * (X1 * X0_est));
    23:    matrix[J, J] G = diag_pre_multiply(tau, L);
    24:    weight = solve_quadprog_chol(G, g0, CE, ce0, CI, ci0);
                    ^
    25:  }
    26:  model {
   -------------------------------------------------

A returning function was expected but an undeclared identifier 'solve_quadprog_chol' was supplied.

mingw32-make.exe: *** [C:/Users/spinkney/AppData/Local/Temp/RtmpsRyQSw/model-4ccc3fdb3083.hpp] Error 1
Error in processx::run(command = make_cmd(), args = c(tmp_exe, include_paths,  : 
  System command 'mingw32-make.exe' failed, exit status: 2, stderr (last 10 lines):
E>     23:    matrix[J, J] G = diag_pre_multiply(tau, L);
E>     24:    weight = solve_quadprog_chol(G, g0, CE, ce0, CI, ci0);
E>                     ^
E>     25:  }
E>     26:  model {
E>    -------------------------------------------------
E> 
E> A returning function was expected but an undeclared identifier 'solve_quadprog_chol' was supplied.
E> 
E> mingw32-make.exe: *** [C:/Users/spinkney/AppData/Local/Temp/RtmpsRyQSw/model-4ccc3fdb3083.hpp] Error 1
Type .Last.error.trace to see where the error occured
spinkney commented 4 years ago

Which is weird because I can compile the earlier stan file (test_for_rok.stan) without a hitch (after deleting the previous .exe).

spinkney commented 4 years ago

Lastly, I'm wondering about

Ginv.setZero();
Ginv.selfadjointView<Lower>().rankUpdate(J.transpose());

I think this would be awesome if it returned the full matrix, though I think this only returns the lower half. So we need to put back auto Ginv = crossprod(J)

Also maybe this works instead of elt_multiply?

auto L_tri = L.triangularView<Eigen::Lower>();
auto c1 = sum(L_tri.cwiseProduct(L_tri));
spinkney commented 4 years ago

I'll see if just modifying the hpp using above for Ginv on my local allows compilation. And test this:

auto L_tri = L.selfadjointView<Lower>();
auto c1 = L_tri.cwiseProduct(L_tri).sum();
rok-cesnovar commented 4 years ago

Ah sorry, I was busy. I will also try this shortly.

spinkney commented 4 years ago

I also cannot get the triangular/selfadjointview things to work.

Now, what's weird is that the only things changed from what you committed is to revert Ginv back to

  auto Ginv = crossprod(J);
  auto x = multiply(Ginv, g0);
  x = -x;

I can run 1 chain successfully but when I try 2 I get this error in chain 2

Chain 2 test_for_rok_chol: stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/DenseCoeffsBase.h:425: Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator()(Eigen::Index) [with Derived = Eigen::Matrix<int, -1, 1>; Eigen::DenseCoeffsBase<Derived, 1>::Scalar = int; Eigen::Index = long int]: Assertion `index >= 0 && index < size()' failed.
Warning: Chain 2 finished unexpectedly!
spinkney commented 4 years ago

If I leave in

* can use J here so don't need to compute inverse again */
  Eigen::Matrix<T0, -1, -1> Ginv = Eigen::Matrix<T0, -1, -1>::Constant(L.rows(), L.cols(), 0.0);
  Ginv.template selfadjointView<Eigen::Lower>().rankUpdate(J.transpose());
  Eigen::Matrix<T_return, -1, 1> x = multiply(Ginv, g0);
  x = -x;

The single chain result finishes. (Results are better too :+1: ). But running 2 chains results in the same error (fyi, I put in line breaks below in the error to read better)

Chain 2 test_for_rok_chol: stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core
/DenseCoeffsBase.h:425: Eigen::DenseCoeffsBase<Derived, 1>::Scalar& 
Eigen::DenseCoeffsBase<Derived, 1>::operator()(Eigen::Index) [with
 Derived = Eigen::Matrix<int, -1, 1>; Eigen::DenseCoeffsBase<Derived, 1>::Scalar = int; Eigen::Index = long int]: Assertion `index >= 0 && index < size()' failed.
Warning: Chain 2 finished unexpectedly!
spinkney commented 4 years ago

No worries about taking your time, I know this is a side project for you!

Is there a way to debug this stuff or a Stan developer document on how to do it?

rok-cesnovar commented 4 years ago

I am not sure what would change with the second chain. This error one looks like an out of bound issue but I dont see how the second chain would have a different access pattern.

I will update the stanc3 code to the latest master that has an added boundary check with a nicer message. That might help us. Building the binaries might take awhile as some of our Continuous integration workers are down and no one currently has access to the machines so the limited amount of workers left take care of the rest.

Is there a way to debug this stuff or a Stan developer document on how to do it?

The best way would be to write a test in C++ and test the function directly in Math. The general procedure is to write math functions, add a bunch of tests for them for various scenarios and then proceed with the language stuff. We are working on both sides here at the same time with limited testing on the C++ side.

Can you post the what you use to test the model with? cmdstanr code and data. I could test locally with stanc3 binary built from source.

spinkney commented 4 years ago
library(cmdstanr)
set_cmdstan_path("~/Sean/cmdstan")
set.seed(12345)

K <- 8
J <- 20
true_w <- c(0.7, 0.3, rep(0, J - 2))

X0 <- matrix(rnorm(K * J, 0, sd = 1), J, K)
X1 <- apply(X0, 2, function(x) t(true_w) %*% (x ))
X1 <- X1 + rnorm(K, 0, 0.01)

K = K
J = J
X1 = X1
X0 = X0
CE = t(t(rep(1, J)))
ce0 = array(-1)
CI =  cbind(diag(J), -diag(J))
ci0 =c(rep(0, J), rep(1, J))

# data preparation

# stan_rdump(c('K', 'J', 'X1', 'X0', 'CE', 
#             'ce0', 'CI', 'ci0'), file = "quadprog_data.R")

file <- file.path("~/Sean/test_for_rok_chol.stan")
mod <- cmdstan_model(file)
fit <- mod$sample(
  data =  list(
    K = K,
    J = J,
    X1 = t(X1),
    X0 = t(X0),
    CE = t(t(rep(1, J))),
    ce0 = array(-1),
    CI =  cbind(diag(J), -diag(J)),
    ci0 =c(rep(0, J), rep(1, J))
  ),
  seed = 987083,
  num_chains = 1,
  num_cores = 2,
  init = 0.05,
  num_warmup = 200,
  num_samples = 200,
  max_depth = 12,
  adapt_delta = 0.8
)
library(data.table)

output <- as.data.table(fit$summary())
output[variable %like% "weight", ]
spinkney commented 4 years ago

The above code just has 1 chain and it should fail. Dialing down inits to 0.01 seems to work for 1 chain but still fails on 2.

spinkney commented 4 years ago

Ok 2 chains work when inits = 0.001...

spinkney commented 4 years ago

This only works on Linux though. The signature using _chol doesn't appear to get picked up on windows. I just pulled the updated cmdstan version from the branch with the stanc updates and rebuilt. I can see that the function signature is in the file too.

Is this a stanc issue on windows?

Semantic error in 'C:/Users/spinkney/AppData/Local/Temp/RtmpAFbnu6/model-2cf81e52684c.stan', line 24, column 11 to column 55:
   -------------------------------------------------
    22:    vector[J] g0 = to_vector(-1 * (X1 * X0_est));
    23:    matrix[J, J] G = diag_pre_multiply(tau, L);
    24:    weight = solve_quadprog_chol(G, g0, CE, ce0, CI, ci0);
                    ^
    25:  }
    26:  model {
   -------------------------------------------------

A returning function was expected but an undeclared identifier 'solve_quadprog_chol' was supplied.

mingw32-make.exe: *** [C:/Users/spinkney/AppData/Local/Temp/RtmpAFbnu6/model-2cf81e52684c.hpp] Error 1
Error in processx::run(command = make_cmd(), args = c(tmp_exe, include_paths,  : 
  System command 'mingw32-make.exe' failed, exit status: 2, stderr (last 10 lines):
E>     23:    matrix[J, J] G = diag_pre_multiply(tau, L);
E>     24:    weight = solve_quadprog_chol(G, g0, CE, ce0, CI, ci0);
E>                     ^
E>     25:  }
E>     26:  model {
E>    -------------------------------------------------
E> 
E> A returning function was expected but an undeclared identifier 'solve_quadprog_chol' was supplied.
E> 
E> mingw32-make.exe: *** [C:/Users/spinkney/AppData/Local/Temp/RtmpAFbnu6/model-2cf81e52684c.hpp] Error 1
rok-cesnovar commented 4 years ago

I was able to replicate the windows issue and fixed it (the build for Windows was buggy). I updated the branch now and verified that it works on windows.

I can't seem to get it to run though, even with the above fixes. Mind pushing your changes to the c++ code? Or post the updated file here. Whenever you have time.

spinkney commented 4 years ago

I'm going to test on windows momentarily. The code below is able to compile but 1 of the 2 chains fails with the following error:

Chain 1 test_for_rok_chol: stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/DenseCoeffsBase.h:425: Eigen::DenseCoeffsBase<Derived, 1>::Scalar& Eigen::DenseCoeffsBase<Derived, 1>::operator()(Eigen::Index) [with Derived = Eigen::Matrix<int, -1, 1>; Eigen::DenseCoeffsBase<Derived, 1>::Scalar = int; Eigen::Index = long int]: Assertion `index >= 0 && index < size()' failed.
Warning: Chain 1 finished unexpectedly!

The relevant c++ changes are

solve_quadprog_chol(const Eigen::Matrix<T0, -1, -1> &L,
               const Eigen::Matrix<T1, -1, 1> &g0,
               const Eigen::Matrix<T2, -1, -1> &CE,
               const Eigen::Matrix<T3, -1, 1> &ce0,
               const Eigen::Matrix<T4, -1, -1> &CI,
               const Eigen::Matrix<T5, -1, 1> &ci0) {
  L.eval();
  using std::abs;
  int i = 0, j = 0, k = 0, l = 0; /* indices */
  int ip, me, mi;
  int n = g0.size();
  int p = ce0.size();
  int m = ci0.size();

  using T_return = return_type_t<T0, T1, T2, T3, T4, T5>;
  Eigen::Matrix<T_return, -1, 1> np(n), u(m + p), d(n), z(n), r(m + p), s(m + p), u_old(m + p);
  Eigen::Matrix<T_return, -1, 1> x_old(n);
  const double inf = std::numeric_limits<double>::infinity();

  T_return sum, psi, ss, t1;

  Eigen::VectorXi A(m + p), A_old(m + p), iai(m + p);
  int q;
  int iq, iter = 0;
  bool iaexcl[m + p];

  me = p; /* number of equality constraints */
  mi = m; /* number of inequality constraints */
  q = 0;  /* size of the active set A (containing the indices of the active
             constraints) */

  /*
   * Preprocessing phase
   */

  /* compute the trace of the original matrix G */
/* can remove this  auto chol = L; 
* can provide the trace already so don't need
* to compute G again 
*  auto G = tcrossprod(L);
*  auto c1 = G.trace(); */

/* tr(AB^T) = sum( A hadamard_prod B) 
* https://en.wikipedia.org/wiki/Trace_(linear_algebra)
*/ 
 auto c1 = elt_multiply(L, L).sum();

  /* initialize the matrix R */
  d.setZero();

  /*  compute the trace of the original matrix G */
/* **CHANGE HERE** G to L */
  Eigen::Matrix<T0, -1, -1> R(L.rows(), L.cols());
  R.setZero();
  T_return R_norm = 1.0; /* this variable will hold the norm of the matrix R */

  /* compute the inverse of the factorized matrix G^-1, this is the initial
   * value for H */
  //J = L^-T
/* **CHANGE HERE** 
* J is only called once and trace is the same regardless of transpose 
* also change chol to L */
  auto J = stan::math::mdivide_left_tri_low(L);
  auto c2 = J.trace();
#ifdef TRACE_SOLVER
  print_matrix("J", J, n);
#endif

  /* c1 * c2 is an estimate for cond(G) */

  /*
   * Find the unconstrained minimizer of the quadratic form 0.5 * x G x + g0 x
   * this is a feasible point in the dual space
   * x = G^-1 * g0
   */

  /* G^-1 = (L^-1)^T (L^-1) 
   * https://forum.kde.org/viewtopic.php?f=74&t=127426
   * */
/* **CHANGE HERE** 
* can use J here so don't need to compute inverse again */
  auto Ginv = stan::math::multiply_lower_tri_self_transpose(stan::math::transpose(J));
  auto x = multiply(Ginv, g0);
  x = -x;

with call in R of

fit <- mod$sample(
  data =  list(
    K = K,
    J = J,
    X1 = t(X1),
    X0 = t(X0),
    CE = t(t(rep(1, J))),
    ce0 = array(-1),
    CI =  cbind(diag(J), -diag(J)),
    ci0 =c(rep(0, J), rep(1, J))
  ),
  seed = 987083,
  num_chains = 2,
  num_cores = 2,
  init = 0.1,
  num_warmup = 200,
  num_samples = 200,
  max_depth = 12,
  adapt_delta = 0.8
)

and Stan code of

data {
  int<lower=1> K;             // num predictors
  int<lower=0> J;                 
  matrix[1, K] X1;
  matrix[K, J] X0;
  matrix[J, 1] CE;      
  vector[1] ce0;         
  matrix[J, 2 * J] CI;   
  vector[2 * J] ci0;    
}
transformed data{
  vector[J] epsilon = rep_vector(1e-12, J);
}
parameters {
  cholesky_factor_corr[J] L;
  vector<lower=0>[J] tau;
  matrix[J, K] z;
}
transformed parameters {
  vector[J] weight;
  matrix[K, J] X0_est = (diag_pre_multiply(tau, L) * z)';
  vector[J] g0 = to_vector(-1 * (X1 * X0_est));
  matrix[J, J] G = diag_pre_multiply(tau, L);
  weight = solve_quadprog_chol(G, g0, CE, ce0, CI, ci0);
}
model {
  L ~ lkj_corr_cholesky(2);
  to_vector(z) ~ std_normal();
  tau ~ std_normal();
  to_vector(X0) ~ normal(to_vector(X0_est), 0.3);
}
generated quantities{
  vector[K] X1_est = X0 * weight;
  vector[K] X1_est_prime = X0_est * weight;
}
spinkney commented 4 years ago

It compiles on windows now :thumbsup:

If we can get this to work I'm getting approximately a 33-36% speedup on both windows and linux (barring the 1 chain failing...)

rok-cesnovar commented 4 years ago

Nice! Will take a look later this afternoon if I can see where exactly this is failing. If you have time, just throw in some prints between the suspects to narrow down which line is throwing this.

spinkney commented 4 years ago

I think I know the issue. When we compute J as the inverse of L there's sometimes really small values of L and they explode up. Is there a way to allow huge numbers in J and c2 or should we round down to 0 elements in L that are really small?

You can see this if I add this to the c++

 auto J = stan::math::mdivide_left_tri_low(L);
  auto c2 = J.trace();
  std::cout << c2 << "\n";

the output right before failing is

Chain 1 29926 
Chain 1 37053.9 
Chain 1 51421.2 
Chain 1 78903.3 
Chain 1 126840 
Chain 1 203725 
Chain 1 317655 
Chain 1 470577 
Chain 1 648384 
Chain 1 818326 
Chain 1 4.76166e+011 
Chain 1 3.05026e+012 
Chain 1 1.57354e+013 
Chain 1 6.54164e+013 
Chain 1 2.29223e+014 
Chain 1 7.16679e+014 
Chain 1 2.1286e+015 
Chain 1 6.45698e+015 
Chain 1 Assertion failed!
Chain 1 
Chain 1 Program: C:\Users\spinkney\Code\custom_latam\test_for_rok_chol.exe
Chain 1 File: stan/lib/stan_math/lib/eigen_3.3.7/Eigen/src/Core/DenseCoeffsBase.h, Line 425
Chain 1 
Chain 1 Expression: index >= 0 && index < size()
Warning: Chain 1 finished unexpectedly!
spinkney commented 4 years ago

This works full solve_quadprod.hpp

#ifndef STAN_MATH_REV_FUN_SOLVE_QUADPROG_HPP
#define STAN_MATH_REV_FUN_SOLVE_QUADPROG_HPP
// minor modification by charlesm93 for implementation in Stan.

/*
 FILE eiquadprog.hh

 NOTE: this is a modified of uQuadProg++ package, working with Eigen data
 structures. uQuadProg++ is itself a port made by Angelo Furfaro of QuadProg++
 originally developed by Luca Di Gaspero, working with ublas data structures.

 The quadprog_solve() function implements the algorithm of Goldfarb and Idnani
 for the solution of a (convex) Quadratic Programming problem
 by means of a dual method.

 The problem is in the form:

 min 0.5 * x G x + g0 x
 s.t.
 CE^T x + ce0 = 0
 CI^T x + ci0 >= 0

 The matrix and vectors dimensions are as follows:
 G: n * n
 g0: n

 CE: n * p
 ce0: p

 CI: n * m
 ci0: m

 x: n

 The function will return the cost of the solution written in the x vector or
 std::numeric_limits::infinity() if the problem is infeasible. In the latter
 case the value of the x vector is not correct.

 References: D. Goldfarb, A. Idnani. A numerically stable dual method for
 solving strictly convex quadratic programs. Mathematical Programming 27 (1983)
 pp. 1-33.

 Notes:
 1. pay attention in setting up the vectors ce0 and ci0.
 If the constraints of your problem are specified in the form
 A^T x = b and C^T x >= d, then you should set ce0 = -b and ci0 = -d.
 2. The matrix G is modified within the function since it is used to compute
 the G = L^T L cholesky factorization for further computations inside the
 function. If you need the original matrix G you should make a copy of it and
 pass the copy to the function.

 The author will be grateful if the researchers using this software will
 acknowledge the contribution of this modified function and of Di Gaspero's
 original version in their research papers.

 LICENSE

 Copyright (2010) Gael Guennebaud
 Copyright (2008) Angelo Furfaro
 Copyright (2006) Luca Di Gaspero

 This file is a porting of QuadProg++ routine, originally developed
 by Luca Di Gaspero, exploiting uBlas data structures for vectors and
 matrices instead of native C++ array.

 uquadprog is free software; you can redistribute it and/or modify
 it under the terms of the GNU General Public License as published by
 the Free Software Foundation; either version 2 of the License, or
 (at your option) any later version.

 uquadprog is distributed in the hope that it will be useful,
 but WITHOUT ANY WARRANTY; without even the implied warranty of
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 GNU General Public License for more details.

 You should have received a copy of the GNU General Public License
 along with uquadprog; if not, write to the Free Software
 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA

 */
#include <boost/math/tools/promotion.hpp>
#include <stan/math/prim/fun/Eigen.hpp>
#include <stan/math/prim/fun/fabs.hpp>
#include <stan/math/prim/fun/sqrt.hpp>
#include <stan/math.hpp>
#include <cmath>
#include <iostream>

namespace stan {
namespace math {

template <typename Scalar> 
inline Scalar distance(Scalar a, Scalar b) {
  Scalar a1, b1, t;
  a1 = fabs(a);
  b1 = fabs(b);
  if (a1 > b1) {
    t = (b1 / a1);
    return a1 * sqrt(1.0 + t * t);
  } else if (b1 > a1) {
    t = (a1 / b1);
    return b1 * sqrt(1.0 + t * t);
  }
  return a1 * sqrt(2.0);
}

template <typename T1, typename T2, typename T3>
inline void compute_d(Eigen::Matrix<T1, -1, 1> &d, const Eigen::Matrix<T2, -1, -1> &J, const Eigen::Matrix<T3, -1, 1> &np) {
  d = J.adjoint() * np;
}

template <typename T1, typename T2, typename T3>
inline void update_z(Eigen::Matrix<T1, -1, 1> &z, const Eigen::Matrix<T2, -1, -1> &J, const Eigen::Matrix<T3, -1, 1> &d,
                     int iq) {
  z = J.rightCols(z.size() - iq) * d.tail(d.size() - iq);
}

template <typename T1, typename T2, typename T3>
inline void update_r(const Eigen::Matrix<T1, -1, -1> &R, Eigen::Matrix<T2, -1, 1> &r, const Eigen::Matrix<T3, -1, 1> &d,
                     int iq) {
  r.head(iq) = mdivide_left_tri<Eigen::Upper>(R.block(0, 0, iq, iq), d.head(iq));
}

template <typename T1, typename T2, typename T3, typename T4>
inline bool add_constraint(Eigen::Matrix<T1, -1, -1> &R, Eigen::Matrix<T2, -1, -1> &J, Eigen::Matrix<T3, -1, 1> &d, int &iq,
                           T4 &R_norm) {
  using std::abs;
  int n = J.rows();
#ifdef TRACE_SOLVER
  std::cerr << "Add constraint " << iq << '/';
#endif
  int i = 0, j = 0, k = 0;
  T3 cc, ss, h, xny;
  T2 t1, t2;
  /* we have to find the Givens rotation which will reduce the element
   d(j) to zero.
   if it is already zero we don't have to do anything, except of
   decreasing j */
  for (j = n - 1; j >= iq + 1; j--) {
    /* The Givens rotation is done with the matrix (cc cs, cs -cc).
     If cc is one, then element (j) of d is zero compared with element
     (j - 1). Hence we don't have to do anything.
     If cc is zero, then we just have to switch column (j) and column (j - 1)
     of J. Since we only switch columns in J, we have to be careful how we
     update d depending on the sign of gs.
     Otherwise we have to apply the Givens rotation to these columns.
     The i - 1 element of d has to be updated to h. */
    cc = d(j - 1);
    ss = d(j);
    h = distance(cc, ss);
    if (h == 0.0)
      continue;
    d(j) = 0.0;
    ss = ss / h;
    cc = cc / h;
    if (cc < 0.0) {
      cc = -cc;
      ss = -ss;
      d(j - 1) = -h;
    } else
      d(j - 1) = h;
    xny = ss / (1.0 + cc);
    for (k = 0; k < n; k++) {
      t1 = J(k, j - 1);
      t2 = J(k, j);
      J(k, j - 1) = t1 * cc + t2 * ss;
      J(k, j) = xny * (t1 + J(k, j - 1)) - t2;
    }
  }
  /* update the number of constraints added*/
  iq++;
  /* To update R we have to put the iq components of the d vector
   into column iq - 1 of R
   */
  R.col(iq - 1).head(iq) = d.head(iq);
#ifdef TRACE_SOLVER
  std::cerr << iq << std::endl;
#endif

  if (fabs(d(iq - 1)) <= std::numeric_limits<double>::epsilon() * R_norm)
    // problem degenerate
    return false;
  if (R_norm >= (fabs(d(iq - 1)))) {
    R_norm = R_norm;
  } else {
    R_norm = fabs(d(iq - 1));
  }
  return true;
}

template <typename T1, typename T2, typename T3>
inline void delete_constraint(Eigen::Matrix<T1, -1, -1> &R, Eigen::Matrix<T2, -1, -1> &J, Eigen::VectorXi &A,
                              Eigen::Matrix<T3, -1, 1> &u, int p, int &iq, int l) {

  int n = R.rows();
#ifdef TRACE_SOLVER
  std::cerr << "Delete constraint " << l << ' ' << iq;
#endif
  int i, j, k, qq;
  T1 cc, ss, h, xny;
  /* Find the index qq for active constraint l to be removed */
  for (i = p; i < iq; i++)
    if (A(i) == l) {
      qq = i;
      break;
    }

  /* remove the constraint from the active set and the duals */
  for (i = qq; i < iq - 1; i++) {
    A(i) = A(i + 1);
    u(i) = u(i + 1);
    R.col(i) = R.col(i + 1);
  }

  A(iq - 1) = A(iq);
  u(iq - 1) = u(iq);
  A(iq) = 0;
  u(iq) = 0.0;
  for (j = 0; j < iq; j++)
    R(j, iq - 1) = 0.0;
  /* constraint has been fully removed */
  iq--;
#ifdef TRACE_SOLVER
  std::cerr << '/' << iq << std::endl;
#endif

  if (iq == 0)
    return;

  for (j = qq; j < iq; j++) {
    cc = R(j, j);
    ss = R(j + 1, j);
    h = distance(cc, ss);
    if (h == 0.0)
      continue;
    cc = cc / h;
    ss = ss / h;
    R(j + 1, j) = 0.0;
    if (cc < 0.0) {
      R(j, j) = -h;
      cc = -cc;
      ss = -ss;
    } else
      R(j, j) = h;

    xny = ss / (1.0 + cc);
    for (k = j + 1; k < iq; k++) {
      T1 t1 = R(j, k);
      T1 t2 = R(j + 1, k);
      R(j, k) = t1 * cc + t2 * ss;
      R(j + 1, k) = xny * (t1 + R(j, k)) - t2;
    }
    for (k = 0; k < n; k++) {
      T2 t1 = J(k, j);
      T2 t2 = J(k, j + 1);
      J(k, j) = t1 * cc + t2 * ss;
      J(k, j + 1) = xny * (J(k, j) + t1) - t2;
    }
  }
}

template <typename T0, typename T1, typename T2, typename T3,
          typename T4, typename T5>
auto
solve_quadprog(const Eigen::Matrix<T0, -1, -1> &G,
               const Eigen::Matrix<T1, -1, 1> &g0,
               const Eigen::Matrix<T2, -1, -1> &CE,
               const Eigen::Matrix<T3, -1, 1> &ce0,
               const Eigen::Matrix<T4, -1, -1> &CI,
               const Eigen::Matrix<T5, -1, 1> &ci0) {
  using std::abs;
  int i = 0, j = 0, k = 0, l = 0; /* indices */
  int ip, me, mi;
  int n = g0.size();
  int p = ce0.size();
  int m = ci0.size();

  using T_return = return_type_t<T0, T1, T2, T3, T4, T5>;
  Eigen::Matrix<T_return, -1, 1> np(n), u(m + p), d(n), z(n), r(m + p), s(m + p), u_old(m + p);
  Eigen::Matrix<T_return, -1, 1> x_old(n);
  const double inf = std::numeric_limits<double>::infinity();

  T_return sum, psi, ss, t1;

  Eigen::VectorXi A(m + p), A_old(m + p), iai(m + p);
  int q;
  int iq, iter = 0;
  bool iaexcl[m + p];

  me = p; /* number of equality constraints */
  mi = m; /* number of inequality constraints */
  q = 0;  /* size of the active set A (containing the indices of the active
             constraints) */

  /*
   * Preprocessing phase
   */

  /* compute the trace of the original matrix G */
  auto c1 = G.trace();

  /* decompose the matrix G in the form LL^T */
  auto L = stan::math::cholesky_decompose(G);

  /* initialize the matrix R */
  d.setZero();

  Eigen::Matrix<T0, -1, -1> R(G.rows(), G.cols());
  R.setZero();
  T_return R_norm = 1.0; /* this variable will hold the norm of the matrix R */

  /* compute the inverse of the factorized matrix G^-1, this is the initial
   * value for H */
  //J = L^-T
  Eigen::Matrix<T0, -1, -1> Li = (1e-9 < L.array().abs()).select(L, 0.0f);
  auto J = stan::math::transpose(stan::math::mdivide_left_tri_low(Li));
  auto c2 = J.trace();
#ifdef TRACE_SOLVER
  print_matrix("J", J, n);
#endif

  /* c1 * c2 is an estimate for cond(G) */

  /*
   * Find the unconstrained minimizer of the quadratic form 0.5 * x G x + g0 x
   * this is a feasible point in the dual space
   * x = G^-1 * g0
   */

  auto x = multiply(stan::math::multiply_lower_tri_self_transpose(J), g0);
  x = -x;

  /* and compute the current solution value */
  auto f_value = 0.5 * g0.dot(x);
#ifdef TRACE_SOLVER
  std::cerr << "Unconstrained solution: " << f_value << std::endl;
  print_vector("x", x, n);
#endif
  /* Add equality constraints to the working set A */
  iq = 0;
  T_return t2 = 0.0;
  T_return t = 0.0;
  for (i = 0; i < me; i++) {
    np = CE.col(i);
    compute_d(d, J, np);
    update_z(z, J, d, iq);
    update_r(R, r, d, iq);
#ifdef TRACE_SOLVER
    print_matrix("R", R, iq);
    print_vector("z", z, n);
    print_vector("r", r, iq);
    print_vector("d", d, n);
#endif

    /* compute full step length t2: i.e., the minimum step in primal space s.t.
       the contraint becomes feasible */
    t2 = 0.0;
    if (abs(z.dot(z)) > std::numeric_limits<double>::epsilon()) {
      // i.e. z != 0
      t2 = (-np.dot(x) - ce0(i)) / z.dot(np);
    }

    x += t2 * z;

    /* set u = u+ */
    u(iq) = t2;
    u.head(iq) -= t2 * r.head(iq);

    /* compute the new solution value */
    f_value += 0.5 * (t2 * t2) * z.dot(np);
    A(i) = -i - 1;

    if (!add_constraint(R, J, d, iq, R_norm)) {
      // FIXME: it should raise an error
      // Equality constraints are linearly dependent
      return x;
    }
  }

  /* set iai = K \ A */
  for (i = 0; i < mi; i++)
    iai(i) = i;

l1:
  iter++;
#ifdef TRACE_SOLVER
  print_vector("x", x, n);
#endif
  /* step 1: choose a violated constraint */
  for (i = me; i < iq; i++) {
    ip = A(i);
    iai(ip) = -1;
  }

  /* compute s(x) = ci^T * x + ci0 for all elements of K \ A */
  ss = 0.0;
  psi = 0.0; /* this value will contain the sum of all infeasibilities */
  ip = 0;    /* ip will be the index of the chosen violated constraint */
  for (i = 0; i < mi; i++) {
    iaexcl[i] = true;
    sum = CI.col(i).dot(x) + ci0(i);
    s(i) = sum;
    if (sum < 0.0) {
      psi += sum;
    }
  }
#ifdef TRACE_SOLVER
  print_vector("s", s, mi);
#endif

  if (fabs(psi) <=
      mi * std::numeric_limits<double>::epsilon() * c1 * c2 * 100.0) {
    /* numerically there are not infeasibilities anymore */
    q = iq;
    return x;
  }

  /* save old values for u, x and A */
  u_old.head(iq) = u.head(iq);
  A_old.head(iq) = A.head(iq);
  x_old = x;

l2: /* Step 2: check for feasibility and determine a new S-pair */
  for (i = 0; i < mi; i++) {
    if (s(i) < ss && iai(i) != -1 && iaexcl[i]) {
      ss = s(i);
      ip = i;
    }
  }
  if (ss >= 0.0) {
    q = iq;
    return x;
  }

  /* set np = n(ip) */
  np = CI.col(ip);
  /* set u = (u 0)^T */
  u(iq) = 0.0;
  /* add ip to the active set A */
  A(iq) = ip;

#ifdef TRACE_SOLVER
  std::cerr << "Trying with constraint " << ip << std::endl;
  print_vector("np", np, n);
#endif

l2a: /* Step 2a: determine step direction */
  /* compute z = H np: the step direction in the primal space (through J, see
   * the paper) */
  compute_d(d, J, np);
  update_z(z, J, d, iq);
  /* compute N* np (if q > 0): the negative of the step direction in the dual
   * space */
  update_r(R, r, d, iq);
#ifdef TRACE_SOLVER
  std::cerr << "Step direction z" << std::endl;
  print_vector("z", z, n);
  print_vector("r", r, iq + 1);
  print_vector("u", u, iq + 1);
  print_vector("d", d, n);
  print_ivector("A", A, iq + 1);
#endif

  /* Step 2b: compute step length */
  l = 0;
  /* Compute t1: partial step length (maximum step in dual space without
   * violating dual feasibility */
  t1 = inf; /* +inf */
  /* find the index l s.t. it reaches the minimum of u+(x) / r */
  for (k = me; k < iq; k++) {
    T_return tmp = u(k) / r(k);
    if (r(k) > 0.0 && (tmp < t1)) {
      t1 = tmp;
      l = A(k);
    }
  }
  /* Compute t2: full step length (minimum step in primal space such that the
   * constraint ip becomes feasible */
  if (abs(z.dot(z)) > std::numeric_limits<double>::epsilon()) // i.e. z != 0
    t2 = -s(ip) / z.dot(np);
  else
    t2 = inf; /* +inf */

  /* the step is chosen as the minimum of t1 and t2 */
  if(t1<=t2) {
    t = t1;
  } else {
    t = t2;
  }
#ifdef TRACE_SOLVER
  std::cerr << "Step sizes: " << t << " (t1 = " << t1 << ", t2 = " << t2
            << ") ";
#endif

  /* Step 2c: determine new S-pair and take step: */

  /* case (i): no step in primal or dual space */
  if (t >= inf) {
    /* QPP is infeasible */
    // FIXME: unbounded to raise
    q = iq;
    return x;
  }
  /* case (ii): step in dual space */
  if (t2 >= inf) {
    /* set u = u +  t * [-r 1) and drop constraint l from the active set A */
    u.head(iq) -= t * r.head(iq);
    u(iq) += t;
    iai(l) = l;
    delete_constraint(R, J, A, u, p, iq, l);
#ifdef TRACE_SOLVER
    std::cerr << " in dual space: " << f_value << std::endl;
    print_vector("x", x, n);
    print_vector("z", z, n);
    print_ivector("A", A, iq + 1);
#endif
    goto l2a;
  }

  /* case (iii): step in primal and dual space */

  x += t * z;
  /* update the solution value */
  f_value += t * z.dot(np) * (0.5 * t + u(iq));

  u.head(iq) -= t * r.head(iq);
  u(iq) += t;
#ifdef TRACE_SOLVER
  std::cerr << " in both spaces: " << f_value << std::endl;
  print_vector("x", x, n);
  print_vector("u", u, iq + 1);
  print_vector("r", r, iq + 1);
  print_ivector("A", A, iq + 1);
#endif

  if (t == t2) {
#ifdef TRACE_SOLVER
    std::cerr << "Full step has taken " << t << std::endl;
    print_vector("x", x, n);
#endif
    /* full step has taken */
    /* add constraint ip to the active set*/
    if (!add_constraint(R, J, d, iq, R_norm)) {
      iaexcl[ip] = false;
      delete_constraint(R, J, A, u, p, iq, ip);
#ifdef TRACE_SOLVER
      print_matrix("R", R, n);
      print_ivector("A", A, iq);
#endif
      for (i = 0; i < m; i++)
        iai(i) = i;
      for (i = 0; i < iq; i++) {
        A(i) = A_old(i);
        iai(A(i)) = -1;
        u(i) = u_old(i);
      }
      x = x_old;
      goto l2; /* go to step 2 */
    } else
      iai(ip) = -1;
#ifdef TRACE_SOLVER
    print_matrix("R", R, n);
    print_ivector("A", A, iq);
#endif
    goto l1;
  }

/* a patial step has taken */
#ifdef TRACE_SOLVER
  std::cerr << "Partial step has taken " << t << std::endl;
  print_vector("x", x, n);
#endif
  /* drop constraint l */
  iai(l) = l;
  delete_constraint(R, J, A, u, p, iq, l);
#ifdef TRACE_SOLVER
  print_matrix("R", R, n);
  print_ivector("A", A, iq);
#endif

  s(ip) = CI.col(ip).dot(x) + ci0(ip);

#ifdef TRACE_SOLVER
  print_vector("s", s, mi);
#endif
  goto l2a;
}

template <typename T0, typename T1, typename T2, typename T3,
          typename T4, typename T5>
auto
  solve_quadprog_chol(const Eigen::Matrix<T0, -1, -1> &L,
                      const Eigen::Matrix<T1, -1, 1> &g0,
                      const Eigen::Matrix<T2, -1, -1> &CE,
                      const Eigen::Matrix<T3, -1, 1> &ce0,
                      const Eigen::Matrix<T4, -1, -1> &CI,
                      const Eigen::Matrix<T5, -1, 1> &ci0) {
    L.eval();
    using std::abs;
    int i = 0, j = 0, k = 0, l = 0; /* indices */
  int ip, me, mi;
  int n = g0.size();
  int p = ce0.size();
  int m = ci0.size();

  using T_return = return_type_t<T0, T1, T2, T3, T4, T5>;
  Eigen::Matrix<T_return, -1, 1> np(n), u(m + p), d(n), z(n), r(m + p), s(m + p), u_old(m + p);
  Eigen::Matrix<T_return, -1, 1> x_old(n);
  const double inf = std::numeric_limits<double>::infinity();

  T_return sum, psi, ss, t1;

  Eigen::VectorXi A(m + p), A_old(m + p), iai(m + p);
  int q;
  int iq, iter = 0;
  bool iaexcl[m + p];

  me = p; /* number of equality constraints */
  mi = m; /* number of inequality constraints */
  q = 0;  /* size of the active set A (containing the indices of the active
   constraints) */

  /*
   * Preprocessing phase
   */

  /* compute the trace of the original matrix G */
  /* can remove this  auto chol = L; 
   * can provide the trace already so don't need
   * to compute G again 
   *  auto G = tcrossprod(L);
   *  auto c1 = G.trace(); */

  /* tr(AB^T) = sum( A hadamard_prod B) 
   * https://en.wikipedia.org/wiki/Trace_(linear_algebra)
   */ 
 // auto c1 = elt_multiply(L, L).sum();
   auto c1 = tcrossprod(L).trace(); 
  /* initialize the matrix R */
  d.setZero();

  /*  compute the trace of the original matrix G */
  /* **CHANGE HERE** G to L */
  Eigen::Matrix<T0, -1, -1> R(L.rows(), L.cols());
  R.setZero();
  T_return R_norm = 1.0; /* this variable will hold the norm of the matrix R */

  /* compute the inverse of the factorized matrix G^-1, this is the initial
   * value for H */
  //J = L^-T
  /* **CHANGE HERE** 
   * J is only called once and trace is the same regardless of transpose 
   * also change chol to L */
 Eigen::Matrix<T0, -1, -1> Li = (1e-9 < L.array().abs()).select(L, 0.0f);
  auto J = stan::math::transpose(stan::math::mdivide_left_tri_low(Li));
  auto c2 = J.trace();
 // std::cout << c2 << "\n";
#ifdef TRACE_SOLVER
  print_matrix("J", J, n);
#endif

  /* c1 * c2 is an estimate for cond(G) */

  /*
   * Find the unconstrained minimizer of the quadratic form 0.5 * x G x + g0 x
   * this is a feasible point in the dual space
   * x = G^-1 * g0
   */

  /* G^-1 = (L^-1)^T (L^-1) 
   * https://forum.kde.org/viewtopic.php?f=74&t=127426
   * */
  /* **CHANGE HERE** 
   * can use J here so don't need to compute inverse again */
  auto x = multiply(stan::math::multiply_lower_tri_self_transpose(J), g0);
  x = -x;

  /* and compute the current solution value */
  auto f_value = 0.5 * g0.dot(x);
#ifdef TRACE_SOLVER
  std::cerr << "Unconstrained solution: " << f_value << std::endl;
  print_vector("x", x, n);
#endif
  /* Add equality constraints to the working set A */
  iq = 0;
  T_return t2 = 0.0;
  T_return t = 0.0;
  for (i = 0; i < me; i++) {
    np = CE.col(i);
    compute_d(d, J, np);
    update_z(z, J, d, iq);
    update_r(R, r, d, iq);
#ifdef TRACE_SOLVER
    print_matrix("R", R, iq);
    print_vector("z", z, n);
    print_vector("r", r, iq);
    print_vector("d", d, n);
#endif

    /* compute full step length t2: i.e., the minimum step in primal space s.t.
       the contraint becomes feasible */
    t2 = 0.0;
    if (abs(z.dot(z)) > std::numeric_limits<double>::epsilon()) {
      // i.e. z != 0
      t2 = (-np.dot(x) - ce0(i)) / z.dot(np);
    }

    x += t2 * z;

    /* set u = u+ */
    u(iq) = t2;
    u.head(iq) -= t2 * r.head(iq);

    /* compute the new solution value */
    f_value += 0.5 * (t2 * t2) * z.dot(np);
    A(i) = -i - 1;

    if (!add_constraint(R, J, d, iq, R_norm)) {
      // FIXME: it should raise an error
      // Equality constraints are linearly dependent
      return x;
    }
  }

  /* set iai = K \ A */
  for (i = 0; i < mi; i++)
    iai(i) = i;

l1:
  iter++;
#ifdef TRACE_SOLVER
  print_vector("x", x, n);
#endif
  /* step 1: choose a violated constraint */
  for (i = me; i < iq; i++) {
    ip = A(i);
    iai(ip) = -1;
  }

  /* compute s(x) = ci^T * x + ci0 for all elements of K \ A */
  ss = 0.0;
  psi = 0.0; /* this value will contain the sum of all infeasibilities */
  ip = 0;    /* ip will be the index of the chosen violated constraint */
  for (i = 0; i < mi; i++) {
    iaexcl[i] = true;
    sum = CI.col(i).dot(x) + ci0(i);
    s(i) = sum;
    if (sum < 0.0) {
      psi += sum;
    }
  }
#ifdef TRACE_SOLVER
  print_vector("s", s, mi);
#endif

  if (fabs(psi) <=
      mi * std::numeric_limits<double>::epsilon() * c1 * c2 * 100.0) {
    /* numerically there are not infeasibilities anymore */
    q = iq;
    return x;
  }

  /* save old values for u, x and A */
  u_old.head(iq) = u.head(iq);
  A_old.head(iq) = A.head(iq);
  x_old = x;

l2: /* Step 2: check for feasibility and determine a new S-pair */
  for (i = 0; i < mi; i++) {
    if (s(i) < ss && iai(i) != -1 && iaexcl[i]) {
      ss = s(i);
      ip = i;
    }
  }
  if (ss >= 0.0) {
    q = iq;
    return x;
  }

  /* set np = n(ip) */
  np = CI.col(ip);
  /* set u = (u 0)^T */
  u(iq) = 0.0;
  /* add ip to the active set A */
  A(iq) = ip;

#ifdef TRACE_SOLVER
  std::cerr << "Trying with constraint " << ip << std::endl;
  print_vector("np", np, n);
#endif

l2a: /* Step 2a: determine step direction */
  /* compute z = H np: the step direction in the primal space (through J, see
   * the paper) */
  compute_d(d, J, np);
  update_z(z, J, d, iq);
  /* compute N* np (if q > 0): the negative of the step direction in the dual
   * space */
  update_r(R, r, d, iq);
#ifdef TRACE_SOLVER
  std::cerr << "Step direction z" << std::endl;
  print_vector("z", z, n);
  print_vector("r", r, iq + 1);
  print_vector("u", u, iq + 1);
  print_vector("d", d, n);
  print_ivector("A", A, iq + 1);
#endif

  /* Step 2b: compute step length */
  l = 0;
  /* Compute t1: partial step length (maximum step in dual space without
   * violating dual feasibility */
  t1 = inf; /* +inf */
  /* find the index l s.t. it reaches the minimum of u+(x) / r */
  for (k = me; k < iq; k++) {
    T_return tmp = u(k) / r(k);
    if (r(k) > 0.0 && (tmp < t1)) {
      t1 = tmp;
      l = A(k);
    }
  }
  /* Compute t2: full step length (minimum step in primal space such that the
   * constraint ip becomes feasible */
  if (abs(z.dot(z)) > std::numeric_limits<double>::epsilon()) // i.e. z != 0
    t2 = -s(ip) / z.dot(np);
  else
    t2 = inf; /* +inf */

  /* the step is chosen as the minimum of t1 and t2 */
  if(t1<=t2) {
    t = t1;
  } else {
    t = t2;
  }
#ifdef TRACE_SOLVER
  std::cerr << "Step sizes: " << t << " (t1 = " << t1 << ", t2 = " << t2
            << ") ";
#endif

  /* Step 2c: determine new S-pair and take step: */

  /* case (i): no step in primal or dual space */
  if (t >= inf) {
    /* QPP is infeasible */
    // FIXME: unbounded to raise
    q = iq;
    return x;
  }
  /* case (ii): step in dual space */
  if (t2 >= inf) {
    /* set u = u +  t * [-r 1) and drop constraint l from the active set A */
    u.head(iq) -= t * r.head(iq);
    u(iq) += t;
    iai(l) = l;
    delete_constraint(R, J, A, u, p, iq, l);
#ifdef TRACE_SOLVER
    std::cerr << " in dual space: " << f_value << std::endl;
    print_vector("x", x, n);
    print_vector("z", z, n);
    print_ivector("A", A, iq + 1);
#endif
    goto l2a;
  }

  /* case (iii): step in primal and dual space */

  x += t * z;
  /* update the solution value */
  f_value += t * z.dot(np) * (0.5 * t + u(iq));

  u.head(iq) -= t * r.head(iq);
  u(iq) += t;
#ifdef TRACE_SOLVER
  std::cerr << " in both spaces: " << f_value << std::endl;
  print_vector("x", x, n);
  print_vector("u", u, iq + 1);
  print_vector("r", r, iq + 1);
  print_ivector("A", A, iq + 1);
#endif

  if (t == t2) {
#ifdef TRACE_SOLVER
    std::cerr << "Full step has taken " << t << std::endl;
    print_vector("x", x, n);
#endif
    /* full step has taken */
    /* add constraint ip to the active set*/
    if (!add_constraint(R, J, d, iq, R_norm)) {
      iaexcl[ip] = false;
      delete_constraint(R, J, A, u, p, iq, ip);
#ifdef TRACE_SOLVER
      print_matrix("R", R, n);
      print_ivector("A", A, iq);
#endif
      for (i = 0; i < m; i++)
        iai(i) = i;
      for (i = 0; i < iq; i++) {
        A(i) = A_old(i);
        iai(A(i)) = -1;
        u(i) = u_old(i);
      }
      x = x_old;
      goto l2; /* go to step 2 */
    } else
      iai(ip) = -1;
#ifdef TRACE_SOLVER
    print_matrix("R", R, n);
    print_ivector("A", A, iq);
#endif
    goto l1;
  }

/* a patial step has taken */
#ifdef TRACE_SOLVER
  std::cerr << "Partial step has taken " << t << std::endl;
  print_vector("x", x, n);
#endif
  /* drop constraint l */
  iai(l) = l;
  delete_constraint(R, J, A, u, p, iq, l);
#ifdef TRACE_SOLVER
  print_matrix("R", R, n);
  print_ivector("A", A, iq);
#endif

  s(ip) = CI.col(ip).dot(x) + ci0(ip);

#ifdef TRACE_SOLVER
  print_vector("s", s, mi);
#endif
  goto l2a;
}
}
}
#endif