Adding ability for quadratic optimizer to use Stan parameters

spinkney commented 4 years ago

Continuing this from the discussion on the Stan forums at https://discourse.mc-stan.org/t/quadratic-optimizier/4405/47.

We want to merge the quadprog code from the test quadprog repo and update the solve_quadprog C++ template to work with stan::math::var. It currently uses Eigen::MatrixXd everywhere and we need to make sure those accept vars.

rok-cesnovar commented 4 years ago

Ok, let me start a branch later today and set everything up and we can work on it. I will give you push permissions.

rok-cesnovar commented 4 years ago

Had some other stan release matters to deal with yesterday, will get on this today.

rok-cesnovar commented 4 years ago

So, I have a version that compiles now at https://github.com/rok-cesnovar/math/tree/solve_quadprog Its ultimately wrong most likely as I just brute-forced all vars to doubles, so the gradient w.r.t. to G will be wrong. But its a start.

rok-cesnovar commented 4 years ago

I will make a cmdstan branch with the added signature for the function tomorrow morning CET time. If you have some (Fake) data we could test this with that would also be great.

spinkney commented 4 years ago

Great, thanks! I'm up early EST and I can test with some fake data.

rok-cesnovar commented 4 years ago

Clone with git clone --branch solve_quadprog https://github.com/stan-dev/cmdstan.git --recursive

spinkney commented 4 years ago

One minor typo that caused me some consternation, haha, in solve_quadprog you have solve_quadpORg.

rok-cesnovar commented 4 years ago

Oh, sorry :)

rok-cesnovar commented 4 years ago

@spinkney Oh, nevermind, I forgot to append the correct stan compiler version. Just a second.

rok-cesnovar commented 4 years ago

If you already cloned, just do a git pull on cmdstan. No need to update the submodules. Your model should then compile but I am expecting wrong values for at least the G parameter matrix.

spinkney commented 4 years ago

If I remove the functions block, assuming that it's in the stan/math folder, I get

spinkney commented 4 years ago

scratch that...I did a make clean-all and rebuild. The quadprog_example_two compiles

spinkney commented 4 years ago

well, it worked!

rok-cesnovar commented 4 years ago

As in "doesnt crash" or "actually gives seemingly correct result"? I would be surprised if the latter.

spinkney commented 4 years ago

Yes, it doesn't crash but it doesn't give the right result because it has to be an unconstrained variable.

I want a lower=0 constraint on the variable that gets passed to G as in.

parameters {
    real<lower=0> sigma; 
    vector<lower=0>[K] v_weights;
}
transformed parameters {
    row_vector[J] weight;
    vector[K] X_weight;
    matrix[K, K] G = diag_matrix(2 * v_weights/sum(v_weights) );

    X_weight = solve_quadprog(G, g0, CE, ce0, CI, ci0);
}

But I get the following error:

rok-cesnovar commented 4 years ago

I am not sure I completely follow. You want the constraing on v_weights?

spinkney commented 4 years ago

Yes, I didn't have the constraint in the post above--updated. Also do the updates allow g0 to have parameters too?

rok-cesnovar commented 4 years ago

Yes, I didn't have the constraint in the post above--updated.

Ok, let me see why that would be problem. Do you have any data (or subsets) you can share?

Also do the updates allow g0 to have parameters too?

Not currently. I am now trying to rework this so that any of the arguments can be parameters. Will push the changes once I finish.

spinkney commented 4 years ago

So far I'm not convinced this simulation creates my situation but it gives you an idea

set.seed(12345)

K <- 8
J <- 3

true_v <- c(0.2, 0.5, 0.3, rep(0, K - 3))
true_w <- c(0.7, 0.3, rep(0, J - 2))

X0 <- matrix(rnorm(K * J, 0, sd = 1), J, K)
X1 <- apply(X0, 2, function(x) t(true_w) %*% (x ))
X1 <- true_v * X1 + rnorm(K, 0, 0.001)

Y0 <- rnorm(J, 10, sd = 3)
Y1 <- true_w %*% Y0 + rnorm(1)

stan_data <- list(K = K,
                  J = J,
                  X1 = X1,
                  X0 = X0,
                  CE = matrix(1, nrow = K, ncol = 1),
                  ce0 = sum(X1),
                  CI =  cbind(diag(K), -diag(K)),
                  ci0 = c(rep(0.0, K), -X1))

K = K
J = J
X1 = X1
X0 = X0
Y0 = Y0
Y1 = as.vector(Y1)
CE = matrix(1, nrow = K, ncol = 1)
ce0 = array(sum(X1))
CI =  cbind(diag(K), -diag(K))
ci0 = c(rep(0.0, K), -X1)

stan_rdump(c('K', 'J', 'X1', 'X0','Y0', 'Y1',  'CE', 
             'ce0', 'CI', 'ci0'), file = "quadprog_data.R")

spinkney commented 4 years ago

data {
    int<lower=1> K;             // num predictors
    int<lower=0> J;                 
    vector[K] X1;
    matrix[J, K] X0;
    vector[J] Y0;
    real Y1;
    matrix[K, 1] CE;      
    vector[1] ce0;         
    matrix[K, 2 * K] CI;   
    vector[2 * K] ci0;    
}
transformed data {
  vector[K] g0 = -2 * X1;
}
parameters {
    real<lower=0> sigma; 
    vector[K] v_weights;
}
transformed parameters {
    row_vector[J] weight;
    vector[K] X_weight;
    matrix[K, K] G = diag_matrix(2 * v_weights/sum(v_weights) );

    X_weight = solve_quadprog(G, g0, CE, ce0, CI, ci0);
}
model {
  sigma ~ normal(0.01, 0.05);
  X1 ~ normal(X_weight, sigma);
}

rok-cesnovar commented 4 years ago

Thanks! Will update once I dig through.

spinkney commented 4 years ago

You can use this to test to get parameter values working and what not. But it's not going to return the simulated values even if you get that to work. The issue is this program will just set X0W = X1 since it doesn't know anything about X0 yet and there's no constraint on the weights. I'll work on updating that.

rok-cesnovar commented 4 years ago

Ok, the function now accepts any of the 6 args as a parameter. And I am fairly certain the gradients are now working fine.

I ran the model with

vector<lower=0>[K] v_weights;

and the sampling finished. There were however a lot of divergences:

975 of 1000 (98%) transitions ended with a divergence.

Run git pull on cmdstan and make stan-update to update the submodules.

spinkney commented 4 years ago

Great, I will try to test this weekend.

I did find the issue with the code and I have it partially rewritten. I've also tested with quadprogpp in R and I'm able to recover the weights when I set v_weights to a uniform simplex (though this is the simple case that is not needed for Stan).

Will test on the simple case with the stan model and then try the harder case with estimating v_weights. In the rewritten case, both G and g0 will be parameters.

Thanks for your quick work on this! I must apologize that I may need a few days to test since I have two small kids and this shelter-at-home has made me a part-time preschool teacher and stay-at-home dad.

rok-cesnovar commented 4 years ago

I completely understand. No rush on my side.

Let me know if it works when you test or if we need to dig a bit deeper. I will add some more tests on the C++ level. Make sure to pull the latest version and update the submodules.

spinkney commented 4 years ago

My rewritten model works and the tests I've done all show this to work out.

G and g0 are parameters and I tested v_weights as a simplex and positive_ordered vector, both work. It's looking pretty good too

name             Mean   MCSE    StdDev  5%  50% 95% N_Eff   N_Eff/s R_hat
lp__           -19.00   0.13    2.18    -22.92  -18.66  -16.00  282.54  499.19  1.00
accept_stat__   0.88    0.00    0.13    0.58    0.92    1.00    791.45  1398.33 1.00
stepsize__  0.40    nan 0.00    0.40    0.40    0.40    nan nan nan
treedepth__ 3.06    0.01    0.35    3.00    3.00    4.00    850.11  1501.96 1.00
n_leapfrog__    8.95    0.12    3.53    7.00    7.00    15.00   926.40  1636.76 1.00
divergent__ 0.00    nan 0.00    0.00    0.00    0.00    nan nan nan
energy__    23.43   0.18    2.92    19.32   23.06   29.03   273.05  482.43  1.00
sigma           0.85    0.02    0.45    0.32    0.75    1.72    836.39  1477.71 1.00
v_weights[1]    0.16    0.01    0.17    0.01    0.11    0.48    1005.77 1776.99 1.00
v_weights[2]    0.35    0.01    0.25    0.06    0.29    0.85    1020.09 1802.28 1.00
v_weights[3]    0.59    0.01    0.34    0.15    0.54    1.25    962.97  1701.36 1.00
v_weights[4]    0.90    0.01    0.43    0.34    0.83    1.71    1159.94 2049.37 1.00
v_weights[5]    1.32    0.02    0.60    0.53    1.23    2.39    1076.27 1901.54 1.01
v_weights[6]    2.09    0.03    0.88    0.95    1.96    3.64    1162.22 2053.39 1.00
v_weights[7]    3.44    0.05    1.48    1.53    3.15    6.36    980.75  1732.78 1.00
v_weights[8]    15.12   0.12    3.86    9.35    14.73   21.70   1116.63 1972.85 1.00
w_optim[1]  0.61    0.00    0.02    0.57    0.62    0.64    597.79  1056.16 1.00
w_optim[2]  0.38    0.00    0.02    0.35    0.38    0.43    645.98  1141.30 1.00
w_optim[3]  0.00    0.00    0.01    0.00    0.00    0.02    943.81  1667.50 1.00

rok-cesnovar commented 4 years ago

That is great news!

spinkney commented 4 years ago

Do you think this is something that can be pushed into Stan for the next release?

rok-cesnovar commented 4 years ago

That is going to be difficult for two reasons. The largest being the licensing. The quadprog++ code from which this work was derived is GPL licensed and that license (as far as I know) does not play well with the BSD license of cmdstan/stan math.

The other reason is that the solve_quadprog function as it's written in the original and in our modification may be performant but the style is pretty "dangerous", especially due to the goto statements. But this isn't as large of an issue, or at least one we can work around by throwing out the gotos.

The solution for the first issue would be to write the solve_quadprog C++ code from scratch (or derived from BSD licensed R/Python/... code) or use a BSD licensed C++ code. Does any of that exist?

Now that this seems to work ok, I am going to add that to the original example (using external C++ file). That should now also work and we can use this without having to clone a fork of math and use a special version of the compiler. So much more accessible. We can then make a post if you agree.

spinkney commented 4 years ago

Cool sounds good. I would love for the work you and Charles have done to be included as well as thinking that this opens up a lot modeling possibilities. Not just the algebra solver stuff that Charles had in mind. Like the possibility of adding generalized method of moments to Stan (something like https://web.hku.hk/~gyin/materials/2009YinBA.pdf) and the nested optimization stuff I'm doing.

What about Eclipse Public License? I found Ipopt C++ code at https://github.com/ampl/coin/tree/master/Ipopt/Ipopt

spinkney commented 4 years ago

I don't think the Eclipse license works with maintaining BSD. However, I did find a python library under BSD license that has a number of solvers including quadratic. These could probably be ported to C++ pretty easily.

https://github.com/Pyomo/pyomo

rok-cesnovar commented 4 years ago

Nice! That seems promising.

rok-cesnovar / math

Adding ability for quadratic optimizer to use Stan parameters #1