jump-dev / Convex.jl

A Julia package for disciplined convex programming
https://jump.dev/Convex.jl/stable/
Other
567 stars 121 forks source link

Logistic Regression #84

Closed ahwillia closed 9 years ago

ahwillia commented 9 years ago

I've been having some problems with optimization problems involving logistic_loss(). I tried to create a stripped down example (univariate logistic regression) to illustrate what I'm running into. The result I get is both slow and wrong.

Hopefully this isn't an error on my end... I followed the example here pretty closely (but with synthetic data): https://github.com/JuliaOpt/Convex.jl/blob/master/examples/logistic_regression.ipynb

using Convex
using SCS
using PyPlot

# logistic function
lgc(x) = 1 ./ (1+exp(-x));

# data
n = 300
X = 3*randn(n)
beta_true = 1.5
Y = map(x->x>rand()? 1 : -1,lgc(beta_true*X))

# fit
beta = Variable(1)
problem = minimize(logistic_loss(-Y.*(X*beta)))
tic()
solve!(problem, SCSSolver(verbose=false))
toc()

This takes roughly 60-80 seconds on my machine (slower on the first run), which is crazy slow. Additionally, the estimate for beta isn't even correct:

figure(figsize=(4,4))
beta_vals = linspace(-1,10,1000)
loss = (Float64)[]
for b in beta_vals
    push!(loss,sum(log(1+exp(-Y.*(X*b)))))
end
plot(beta_vals,loss,"-b")
plot(beta_vals[indmin(loss)],loss[indmin(loss)],"ok")
plot(beta.value,sum(log(1+exp(-Y.*(X*beta.value)))),"or")
legend(("loss","optimal","returned by SCS"))
xlabel("Beta"), ylabel("Loss")

image

mlubin commented 9 years ago

This is an interesting test case. Not sure what the bottleneck is in SCS, but it could be much faster to write the model in nonlinear form and hand it to Ipopt through JuMP. The benefit of going though SCS is that it lets you use many different classes of constraints in the same problem. If your problem is just logistic regression, then there will be more efficient approaches available (not claiming that Ipopt is the best either).

ahwillia commented 9 years ago

I'm actually interested in re-implementing and then tweaking the methods in this paper on binary matrix completion, which involves minimizing a logistic loss subject to a constraint on the nuclear norm of the matrix. I was thinking the logistic regression example would be easier to understand/reproduce.

ahwillia commented 9 years ago

I just had a look at the code, and have a couple of thoughts/questions that might help identify the problem... We want to represent the following function:

image

This line, tries to express the terms within the logarithm as a sum of exponentials:

logistic_loss(e::AbstractExpr) = logsumexp([e, Constant(zeros(size(e)))])

image

The problem is that the evaluate function sums over everything, so passing a vector into logistic_loss instead results in

image

So I tried the niave fix, which was to first change the line above to concatenate e and zeros() horizontally:

logistic_loss(e::AbstractExpr) = logsumexp([e Constant(zeros(size(e)))])

And then change the evaluate() function to sum over the second dimension first (resulting in a column vector), then take the log, and then sum the logs:

return sum(log(sum(exp(evaluate(x.children[1])),2)))

This didn't seem to fix the problems I reported above, and I'm sure I also broke the logsumexp() operation in the process. But it seems to me that something is not working as intended.

mlubin commented 9 years ago

I was thinking along these lines. The issue is a bit deeper than evaluate, which has no bearing on how fast the optimization runs.

It looks like convex.jl introduces a new variable and linear constraint for each Constant value. There's nothing wrong with this per se, and many solvers are able to detect these structures by preprocessing, and so they avoid the extra work of treating constants as optimization variables. However, SCS, afaik, does not perform any preprocessing, so this transformation into exponential cones with constant variables ends up being inefficient.

CC @bodono

madeleineudell commented 9 years ago

Note that conic form is an extremely inefficient representation for logistic regression, even if the presolve step that @mlubin suggests were employed. Conic form introduces a separate exponential cone constraint for every observation a_i; when in fact the function we're optimizing is extremely well behaved: smooth, with bounded derivative. Consider the difficulty of enforcing all those conic constraints, compared to just taking gradient/Newton steps on such a nice objective, and you'll see why conic form is not the usual choice for the ML crowd.

But more worrisome: @ahwillia, you're right that logistic_loss does the wrong thing for vector arguments. I'll fix that immediately.

mlubin commented 9 years ago

Logistic regression + nuclear norm seems like a good problem to test on the nonexistent NLP+conic solver.

madeleineudell commented 9 years ago

Definitely. I think it's conventional to solve such problems with first order operator splitting methods, but I imagine there are regimes in which an NLP+conic solver would do much better: particularly, when high accuracy is required. On Jun 8, 2015 9:25 AM, "Miles Lubin" notifications@github.com wrote:

Logistic regression + nuclear norm seems like a good problem to test on the nonexistent NLP+conic solver.

— Reply to this email directly or view it on GitHub https://github.com/JuliaOpt/Convex.jl/issues/84#issuecomment-110064048.

bodono commented 9 years ago

I ran SCS until it got 1e-9 accuracy, and it still has a persistent error between the solution it returns and the optimum in the curve. It could be a bug in the formulation of the problem, or in SCS. I have run lots of exponential programs through cvx though, and it generally doesn't have this error.

The slow-down is a consequence of the fact that representing this as a cone program introduces a bunch of extra variables. You can speed it up by enabling parallel projections onto the exponential cone (recompiling SCS from source with USE_OPENMP=1), and setting the scale=0.1 in the params.I get it down from about 50 secs to about 15 secs between these two.

mlubin commented 9 years ago

@bodono, could the projection onto the exponential code be made faster if it's known that some variables are fixed?

bodono commented 9 years ago

I wrote up basically the same example in matlab + CVX

clear all;close all;
n = 300;
x = 3*randn(n,1);
beta_true = 1.5;

y = 2*(1./(1+exp(-beta_true*x)) > randn(n,1)) - 1;

xy = x.*y;

cvx_begin
cvx_solver scs
cvx_solver_settings('eps',1e-8, 'max_iters', 2500, 'scale', 1)
variable b
minimize(sum(log_sum_exp([zeros(1,n);-b*xy'])))
cvx_end

beta_vals = linspace(-1,2,10000);
loss=[];
for i=1:length(beta_vals)
    loss(i) = sum(log_sum_exp([zeros(1,n);-beta_vals(i)*xy']));
end
[b_s, ix] = min(loss);
plot(beta_vals, loss); hold on
plot(beta_vals(ix),loss(ix),'rx')
plot(b,sum(log(1+exp(-b*xy))), 'bo')
legend('loss','optimal','returned by SCS')

Here is the output:

----------------------------------------------------------------------------
    SCS v1.1.2 - Splitting Conic Solver
    (c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-direct, nnz in A = 2100
eps = 1.00e-08, alpha = 1.50, max_iters = 2500, normalize = 1, scale = 1.00
Variables n = 1200, constraints m = 1801
Cones:  primal zero / dual free vars: 1
    exp vars: 0, dual exp vars: 1800
Setup time: 1.32e-01s
----------------------------------------------------------------------------
 Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
     0|      inf       inf       nan      -inf       inf       inf  2.30e-02 
    80| 4.68e-09  7.07e-10  3.55e-11 -1.92e+02 -1.92e+02  4.18e-14  3.94e-01 
----------------------------------------------------------------------------
Status: Solved
Timing: Total solve time: 3.96e-01s
    Lin-sys: nnz in L factor: 5101, avg solve time: 2.35e-05s
    Cones: avg projection time: 4.62e-03s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 1.4448e-14, dist(y, K*) = 0.0000e+00, s'y/m = 1.7833e-11
|Ax + s - b|_2 / (1 + |b|_2) = 4.6762e-09
|A'y + c|_2 / (1 + |c|_2) = 7.0677e-10
|c'x + b'y| / (1 + |c'x| + |b'y|) = 3.5516e-11
----------------------------------------------------------------------------
c'x = -191.6723, -b'y = -191.6723
============================================================================
------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): +191.672

With plot:

log_reg

SCS solved it (without parallelization and all that) in about 0.5 secs. My guess is that there is a slight bug in the way Convex.jl is converting the problem into a cone format (the number of variables is about the same in both cases).

madeleineudell commented 9 years ago

I've now updated the definition of logistic_loss to compute what you'd expect for vector arguments, ie,

logistic_loss(x) == sum([logsumexp([x[i]) 0]) for i=1:n])

and the problem is much faster to solve: about 3.5 seconds, after initial compilation. That's within spitting distance of the CVX version, and the difference is (probably?) attributable to the fact that CVX does a presolve to eliminate redundant variables (eg, variables that are really just constants).

madeleineudell commented 9 years ago

And it does indeed now give the right answer: image

bodono commented 9 years ago

Nice!