jump-dev / Convex.jl

A Julia package for disciplined convex programming
https://jump.dev/Convex.jl/stable/
Other
568 stars 123 forks source link

Matrix exp/log/entropy #138

Closed bschreck closed 6 months ago

bschreck commented 8 years ago

Do the exp or log function support the matrix versions? i.e. exp or log of the singular values of the matrix?

I'm looking for a way to implement a bregman divergence term like Tr(Xlog(X/A) - X + A), where A is a constant.

madeleineudell commented 8 years ago

Interesting. It's pretty easy to give access to any function of the eigenvalues; we don't currently expose the functionality, because it's a bit difficult to certify convexity. But if you're ready for adventure, check out the schatten branch, and try the eig function: for example,

using Convex

n = 5

generate a random PSD matrix M

A = randn(n,n) U,S,V = svd(A) d = rand(n) M = U_d_U'

X = Semidefinite(n) p = maximize(sum(log(eig(X))), M - X in :SDP) solve!(p)

@show p.optval - sum(log(d))

Any convex or concave function of the eigenvalues should work, including KL divergence, which I think is what you want. But exercise caution: I'm not yet willing to certify correctness of the formulation. Under the hood, if you try to minimize f(eig(A)), it will solve the problem

minimize f(d) subject to U upper triangular d == diag(U) A :in :PSD [diagm(d) U; U' A] :in :PSD

On Tue, May 3, 2016 at 6:12 PM, Ben Schreck notifications@github.com wrote:

Do the exp or log function support the matrix versions? i.e. exp or log of the singular values of the matrix?

I'm looking for a way to implement a bregman divergence term like Tr(Xlog(X/A) - X + A), where A is a constant.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/JuliaOpt/Convex.jl/issues/138

Madeleine Udell Postdoctoral Fellow at the Center for the Mathematics of Information California Institute of Technology https://courses2.cit.cornell.edu/mru8 https://courses2.cit.cornell.edu/mru8 (415) 729-4115

bschreck commented 8 years ago

Thanks I'll check it out! I've actually never used Julia before (I came here because CVXPY doesn't have support for this) so this will be fun.

Where does that second optimization problem come from? Is this just a standard way of converting convex optimization problems into these "exponential cone" problems I've been reading about? I'm trying to wrap my head around these conversions but I'm a little fuzzy on the linear algebra involved. In particular- I keep seeing block matrices in the constraints that seem to come from a matrix factorization. Any chance you can point me to somewhere that explains how they come about?

bschreck commented 8 years ago

Also if you're interested I'm trying to implement an online matrix completion algorithm from this paper: http://arxiv.org/pdf/1204.0136.pdf

madeleineudell commented 8 years ago

That's a cool problem! If you don't mind, it would be great if you could contribute it as an example problem to Convex.jl when you've got a nice formulation.

To understand why these block matrix constraints emerge, the critical idea is that of Schur complements:

[A B; B^T C] is psd iff A - B C^{-1} B^T is psd and A is psd.

You can use this fact to get your hands on factors of the matrix A, as long as the problem forces B and C into the proper form so that they really do factorize A. Boyd and Vanderberghe's Convex Optimization has a good overview of Schur complements; for their use in matrix factorization problems, I like this paper http://arxiv.org/abs/0706.4138 by Recht, Fazel, and Parillo.

On Tue, May 3, 2016 at 8:17 PM, Ben Schreck notifications@github.com wrote:

Also if you're interested I'm trying to implement an online matrix completion algorithm from this paper: http://arxiv.org/pdf/1204.0136.pdf

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/JuliaOpt/Convex.jl/issues/138#issuecomment-216732112

Madeleine Udell Postdoctoral Fellow at the Center for the Mathematics of Information California Institute of Technology https://courses2.cit.cornell.edu/mru8 https://courses2.cit.cornell.edu/mru8 (415) 729-4115

bschreck commented 8 years ago

Awesome, thanks I'll check out that paper. That's also too funny I literally just emailed Pablo Parrilo about this problem asking if he has suggestions for solvers/resources. He's been sitting in on my online learning class

bschreck commented 8 years ago

Is there also a way to get the eigenvectors of X? for logX I think I would need Q_log(eigs(X))_Q^T, right? And then for Tr(XlogX) I would do sum(elementwise_mul(X,logX))

madeleineudell commented 8 years ago

I'm not entirely sure that I understand the meaning of your expression above, so I'm going to be very careful. If X = Q diagm(s) Q^T, then log(X) = Q diagm(log(s)) Q^T (ie, it has the same eigenvectors as X), so Tr(X log(X)) = sum( s .* log(s)) = -entropy(eigs(X)).

It's important that you take a closer look at the equivalent problem I sent to verify that solving the SDP Convex.jl will form will actually find the solution to your problem. Can you prove that

minimize -entropy(d) - trace(X) subject to U upper triangular d == diag(U) X :in :PSD [diagm(d) U; U' X] :in :PSD

gives the right answer to your problem? On Wed, May 4, 2016 at 11:19 AM, Ben Schreck notifications@github.com wrote: > Is there also a way to get the eigenvectors of X? for logX I think I would > need Q_log(eigs(X))_Q^T, right? And then for Tr(XlogX) I would do > sum(elementwise_mul(X,logX)) > > — > You are receiving this because you commented. > Reply to this email directly or view it on GitHub > https://github.com/JuliaOpt/Convex.jl/issues/138#issuecomment-216954887 ## Madeleine Udell Postdoctoral Fellow at the Center for the Mathematics of Information California Institute of Technology _https://courses2.cit.cornell.edu/mru8 https://courses2.cit.cornell.edu/mru8_ (415) 729-4115
bschreck commented 8 years ago

That seems right- You have the expression right. Tr(Xlog(X)) should be equal to -entropy(eigs(X)).

If that minimization problem holds for the eigenvalues, then I don't see why it wouldn't work here. Everything is still convex. The actual expression is a bit more complicated than -entropy(X) - trace(X). I wrote it above with A as a constant, but in the actual algorithm A is a constant matrix, a function of X_(t-1), the X from the previous time stamp.

The full formula is Tr(XlogX - XlogA - X + A), where A = exp(log(X_(t-1) - eta*Lt). eta is the step size, Lt is a loss matrix. I'm pretty sure this works out to:

-entropy(eigs(X)) - 2_sum(X.log(X(t-1))) + 2_eta_sum(X.*Lt) + Tr(X)

However, I tried to run this a couple of iterations on some random data and Convex.jl runs for the maximum # of iterations on the 2nd iteration and returns a potentially inaccurate answer, so maybe I expanded that function wrong, or maybe something is wrong with the formulation in terms of the eigenvalue minimization problem. The authors of the paper do claim that this can be solved by a standard convex optimization solver, and name the "ellipsoid" method as a possibility.

madeleineudell commented 8 years ago

Is X(t-1) a constant (matrix) or a variable when solving this problem? If it's a variable then sum(X.*log(X(t-1)) is not convex.

The underlying solver called by Convex.jl (if you don't specify anything else) is SCS, which is not great for solving ill-conditioned problems, which is why you're seeing it run for the maximum number of iterations. Unfortunately there are no other solvers available yet for problems using exps/logs/entropy and semidefinite matrix variables.

The ellipsoid method is a favorite method of theorists, but it is very slow --- when papers say something "can be solved by the ellipsoid method", they mean the problem is polynomial-time solvable, but they do not mean that's the right way to solve it.

Madeleine

On Mon, May 9, 2016 at 7:27 PM, Ben Schreck notifications@github.com wrote:

That seems right- You have the expression right. Tr(Xlog(X)) should be equal to -entropy(eigs(X)).

If that minimization problem holds for the eigenvalues, then I don't see why it wouldn't work here. Everything is still convex. The actual expression is a bit more complicated than -entropy(X) - trace(X). I wrote it above with A as a constant, but in the actual algorithm A is a constant matrix, a function of X_(t-1), the X from the previous time stamp.

The full formula is Tr(XlogX - XlogA - X + A), where A = exp(log(X_(t-1) - eta*Lt). eta is the step size, Lt is a loss matrix. I'm pretty sure this works out to:

-entropy(eigs(X)) - 2_sum(X.log(X(t-1))) + 2_eta_sum(X.*Lt) + Tr(X)

However, I tried to run this a couple of iterations on some random data and Convex.jl runs for the maximum # of iterations on the 2nd iteration and returns a potentially inaccurate answer, so maybe I expanded that function wrong, or maybe something is wrong with the formulation in terms of the eigenvalue minimization problem. The authors of the paper do claim that this can be solved by a standard convex optimization solver, and name the "ellipsoid" method as a possibility.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/JuliaOpt/Convex.jl/issues/138#issuecomment-218043399

Madeleine Udell Postdoctoral Fellow at the Center for the Mathematics of Information California Institute of Technology https://courses2.cit.cornell.edu/mru8 https://courses2.cit.cornell.edu/mru8 (415) 729-4115

bschreck commented 8 years ago

It's a constant matrix. Its basically the output of the previous step. You can think of this as a form of dual coordinate ascent, or of projected gradient ascent (at each time step, project into a different space, subtract the gradient in that space, project back into the original space). This paper describes the divergence function well (they call it Von Neumann divergence) http://arxiv.org/pdf/0706.4138v1.pdf.

One thing I realized that may be the problem is that the divergence relies on 0log0 being zero. Do you know if that's the case in Julia/Convex.jl? I'm going to look that up.

Thanks for sticking with me, Madeleine! I'm going to figure this out, and when I do I'll definitely post it as an example.

madeleineudell commented 8 years ago

Yes, Convex.jl knows that lim_{x\to 0} x\log(x) = 0:

x = Variable() p = minimize(-entropy(x), x==0) solve!(p) p.optval

is about 0.

On Tue, May 10, 2016 at 4:42 AM, Ben Schreck notifications@github.com wrote:

It's a constant matrix. Its basically the output of the previous step. You can think of this as a form of dual coordinate ascent, or of projected gradient ascent (at each time step, project into a different space, subtract the gradient in that space, project back into the original space). This paper describes the divergence function well (they call it Von Neumann divergence) http://arxiv.org/pdf/0706.4138v1.pdf.

One thing I realized that may be the problem is that the divergence relies on 0log0 being zero. Do you know if that's the case in Julia/Convex.jl? I'm going to look that up.

Thanks for sticking with me, Madeleine! I'm going to figure this out, and when I do I'll definitely post it as an example.

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/JuliaOpt/Convex.jl/issues/138#issuecomment-218133487

Madeleine Udell Postdoctoral Fellow at the Center for the Mathematics of Information California Institute of Technology https://courses2.cit.cornell.edu/mru8 https://courses2.cit.cornell.edu/mru8 (415) 729-4115

dstahlke commented 4 years ago

I think this might not work. I cherry picked the "schatten" commit onto the master branch and cleaned it up a bit to get it running: renamed atom->struct, used LinearAlgebra.eigvals in evaluate, changed diagm(d) to diagm(0 => d). Modified file is attached. Caveat: I have literally one day of experience with Julia, so I could have messed it up.

The result is that maximizing tr(w) gives a different result than maximizing sum(eig(w)), for w = Semidefinite(n).

Can you explain a bit more about the math? I understand Schur complements but don't know anything about triangular matrices. This looks like a useful technique though (if it works).

Also, do you see any way to maximize Tr(A log X) where X is HermitianSemidefinite and A is a constant? I guess this eigenvalues trick wouldn't work here because there is no way to rotate them back to the original basis.

schatten.jl.gz

dstahlke commented 4 years ago

I did track down a description of this Schur trick for logdet: section 6.2.3 of https://docs.mosek.com/modeling-cookbook/sdo.html

One could also consider [I U ; U' X] in :SDP, which is feasible when U * U' is the Cholesky decomposition of X. It looks like the singular values of U will be the eigenvalues of sqrt(M). However, the eigenvalues of U will be something different.

ericphanson commented 4 years ago

@dstahlke welcome to Julia :). I have not tried out your code but I will probably have time to do so next week. What kinds of problems are you interested in? Minimizing Schatten norms subject to SDP constraints? Maybe this post by @madeleineudell on the CVX forums is helpful: http://ask.cvxr.com/t/represent-schatten-p-norm-in-cvx/649/6.

If you are interested in the matrix relative entropy, https://github.com/hfawzi/cvxquad provides a CVX implementation; I am interested in porting it to Julia at some point, ideally via MathOptInterface bridges.

dstahlke commented 4 years ago

I'm interested in relative entropy. That Fawzi paper and module looks promising. It's only an approximation, but it looks like they are getting to within 1e-5 or 1e-6 which is not bad.

I think I could manage to implement this directly as an atom: the Convex.jl atoms seem pretty simple. If you would rather it use bridges, do you mind telling me a bit about that? I wasn't able to gain much insight from your second link and I see only a single reference in Convex.jl, to MOIB.full_bridge_optimizer.

ericphanson commented 4 years ago

MathOptInterface (MOI) is an intermediary layer that sits between Convex.jl and the underlying solvers (SCS, Mosek, etc). The job of MathOptInterface is that if a solver does not support a particular cone (e.g. SDPA does not natively handle the SOCP cone) but that cone can be simulated inside a cone it does handle, MathOptInterface "bridges" the constraint by performing that simulation. In this language, cvxquad is an (approximate) bridge from the operator relative entropy cone to the SDP cone.

Right now, Convex.jl itself also plays the role of some bridges, e.g. from the complex positive semi-definite matrix (PSD) cone to the real PSD cone. However, Convex.jl doesn't have any idea of when to apply a "bridge", it just always does so. Moreover, Convex.jl's "bridges" are not always the most efficient in that they can involve unnecessary copying of the data. MathOptInterface has a well-thought out design for layering these bridges in a performant way, and only applying them when necessary. MathOptInterface also has other "frontends", i.e. user-facing modelling languages, than Convex.jl. Actually, Convex.jl is a rather recent user of MOI, and the main user is JuMP.jl (before this, Convex.jl used a different intermediary layer, MathProgBase, but that software isn't being developed anymore because the devs created MOI instead, as the next generation library). So one other advantage of doing the bridging in MOI rather than Convex is that other frontends can be used.

However, MOI is also pretty complicated. I learned enough about it so switch Convex.jl over to using it, but not a ton more than that. One resource is the JuMP-dev chat channel: https://gitter.im/JuliaOpt/JuMP-dev. I got some help from @blegat there with understanding MOI to switch Convex over, but I understand he's pretty busy now. One way to get a better idea for what's involved is to look at this PR implementing a bridge from the Schatten 1-norm cone to the PSD cone: https://github.com/JuliaOpt/MathOptInterface.jl/pull/976. This is an example where Convex.jl also implements the same bridge. One of my plans for Convex.jl is to change the internals to use MOI bridges directly rather than doing it ourselves, with the expectation that it will improve performance.

Since you are a new Julia user, it might be better to implement it in Convex.jl first since it should be simpler as you said. That should definitely work and be useful, and if at a later point we port it to MOI then it will help to have a Julia reference to look at.

One last consideration is that cvxquad does not have a code license. Before we add it to Convex.jl, then, we should get permission from the authors. I'm actually already in contact with Hamza Fawzi, so I can email him and see if it's okay.

dstahlke commented 4 years ago

Thank you for that detailed description. I'm coming to learn that one of the big strengths of Julia is its interoperability and well designed APIs and ecosystem. I'll do some more learning, and will take a closer look at Fawzi's paper. It would be really cool if arbitrary functions could be automatically adapted using some sort of symbolic analysis. That might not be feasible though, since it seems in each case specialized theorems come into play to make an efficient implementation. It's not just a matter of doing a Taylor expansion (although maybe that's feasible as a fallback).

I said I was interested in relative entropy but (for now) one of the parameters is constant so more accurately I'm interested in what he calls trace_logm. Implementing that as an atom would be a good starting point for me, and an excuse to learn more Julia.

My background is mostly C++ (day job) and some Python (during PhD research). Julia seems to combine much of what is good from both, and has an excellent package ecosystem. I was amazed actually to see that the selection of scientific packages seems to even beat Python.

ericphanson commented 4 years ago

Thank you for that detailed description.

No problem, I'm glad you're interested!

It would be really cool if arbitrary functions could be automatically adapted using some sort of symbolic analysis. That might not be feasible though, since it seems in each case specialized theorems come into play to make an efficient implementation. It's not just a matter of doing a Taylor expansion (although maybe that's feasible as a fallback).

You might already know some or all of the following, but maybe it will be helpful:

I think this kind of approach is sort of what's done in local optimization routines like some provided by Optim.jl, or if there are non-trivial constraints, through something like Ipopt which can be accessed via JuMP.jl. There's actually a lot more packages too, there's a whole world of smooth optimization. These packages either require a gradient/and or Hessian, which might be able to be computed via an autodifferentiation package like ForwardDiff, ForwardDiff2, Zygote, ReverseDiff (and there's more, all in Julia!), or they themselves use autodifferentiation to compute the gradients, and then use that information to help find the minimum.

Convex.jl and similar packages (e.g. conic solvers accessed via JuMP, or CVX in MATLAB, or others) instead use conic solvers which are only applicable to certain types of problems, but use the structure of the problem in a significant way to aid in solving it. This is where your remark "since it seems in each case specialized theorems come into play to make an efficient implementation" comes in; that's exactly the structure that these solvers are trying to take advantage of. So a more generic approach can definitely work, but that's not what Convex in particular is going for. Besides possible efficiency gains, I know of two other advantages of using the structure of the problem: ease of modelling (I think Convex's API is relatively simple for what it is, and the same for CVX), and it can often be easy to check that a problem lives in some class under which theoretical guarantees are known (e.g. a semi-definite program can be solved in polynomial time under certain general assumptions). My reference for conic programming is usually the book by Boyd and Vandenberghe.

I said I was interested in relative entropy but (for now) one of the parameters is constant so more accurately I'm interested in what he calls trace_logm. Implementing that as an atom would be a good starting point for me, and an excuse to learn more Julia.

Sounds great! Please feel free to open a pull request to Convex for review, or open an issue, or private message me on the JuMP-dev gitter I linked earlier if you want me to take a look at anything. I'd be happy to help get something like that into Convex. (But I might not have much time this week).

I'm coming to learn that one of the big strengths of Julia is its interoperability and well designed APIs and ecosystem.

My background is mostly C++ (day job) and some Python (during PhD research). Julia seems to combine much of what is good from both, and has an excellent package ecosystem. I was amazed actually to see that the selection of scientific packages seems to even beat Python.

I'm a big Julia fan :). Glad to hear you're liking it so far! If you haven't seen it, there's a great talk by one of Julia's co-creators that looks at a key feature (multiple dispatch) and how it enables interoperability: https://www.youtube.com/watch?v=kc9HwsxE1OY.

ericphanson commented 4 years ago

Just a brief update: Hamza says it would be great if we port it to Julia and he’ll look into adding a license too. So that aspect should be fine.

dstahlke commented 4 years ago

That's great news. I was prepared to reproduce it from the paper, but of course looking at code makes it easier and more likely to be correct. I have so far experimented with creating at atom for the Lovasz theta body, which I will at some point submit as a separate pull request.

dstahlke commented 4 years ago

I've started porting matrix_geo_mean_hypo_cone because all the other functions depend on this one. Question though: is it possible for an atom to return a matrix variable rather than a scalar? This atom is of the form A #_t B ⪰ T. For example, when t=1/2 this corresponds to [A T; T' B] ⪰ 0. I'd like matrixgeomeanhypocone to take arguments A and B and return T. But all the existing atoms seem to only return scalars. I couldn't figure out what to pass to cache_conic_form in this case.

One could also imagine matrixfrac(X, P) returning a matrix when X is a matrix rather than a vector.

ericphanson commented 4 years ago

Atoms can return matrices, although Convex sadly does not support higher-dimensional arrays, which means workarounds might be needed at some parts because I think cvxquad does use them (well, higher dimensional objects are fine for internal use within Convex’s code, but can’t be returned from atoms, so I’m not sure if this will be a problem or not). Actually, matrices are really all Convex knows; all atoms have a size parameter with 2 dimensions, so a scalar is really like a 1 by 1 matrix in Convex, in contrast to regular Julia. At some point I hope to get Convex to support general arrays and scalars but that’s a task for another time.

Have a look at some of the affine atoms for atoms which return matrices, eg addition, multiplication, transpose etc.

However, I think the analog to the cone functions in cvxquad might not be atoms themselves. I think the atom would be like eg quant_rel_entr and then inside its conic_form function you would use those functions to get the appropriately constrained variables. So I think the cone stuff would just be regular functions, not atoms.

ericphanson commented 4 years ago

Also, in his email Hamza mentioned this:

I am planning to update the cvxquad code some time soon to fix a known issue with the matrix_geomean functions -- the issue does not affect the quantum(rel_)entr functions though, and is a relatively benign one -- I've been postponing this but I hope to do it soon.

Hopefully that doesn’t affect what you’ve been doing, since if the issue doesn’t affect quantum_(rel_)entr then it is maybe not in the cone bit, but in the function itself?

dstahlke commented 4 years ago

Thanks. This helps me get started. I've implemented a couple of the atoms, but haven't tested much. It would be nice if the cones can be atoms because it leads to a nice syntax where f(X,Y) returns Z which can then be used in other expressions (the cone being f(X, Y) ⪰ Z). In a sense even quantum_rel_entr is a cone, although it has a minimize call in it. Actually, that brings up a question: what does minimize do in conic_form!? Is it just for verifying DCP compliance? If so, do I need something like maximize when returning Z constrained to the cone f(X, Y) ⪰ Z?

If these cones are implemented as functions rather than atoms, is there a good way to add their constraints to the problem? I could pass unique_conic_forms as a parameter to the functions, but it seems this would prevent using them from outside the Convex.jl internals.

As for the arrays, I don't think it's a problem (for now). Arrays don't seem to be used anywhere, at least that I've seen so far. I think they just exist as an option in case the user wants several variables constrained to the cone. As compared to, e.g., calling matrix_geo_mean_hypo_cone several times, the arrays allow sharing some of the intermediate variables (and so create a smaller problem).

ericphanson commented 4 years ago

Sorry for the delay; just submitted my PhD thesis on Friday so I should be a bit more free now :).

Thanks. This helps me get started. I've implemented a couple of the atoms, but haven't tested much. It would be nice if the cones can be atoms because it leads to a nice syntax where f(X,Y) returns Z which can then be used in other expressions (the cone being f(X, Y) ⪰ Z). In a sense even quantum_rel_entr is a cone, although it has a minimize call in it. Actually, that brings up a question: what does minimize do in conic_form!? Is it just for verifying DCP compliance? If so, do I need something like maximize when returning Z constrained to the cone f(X, Y) ⪰ Z?

I'm not 100% sure what you mean. One thing that might be related is that early in the process Convex converts all problems to minimizations (by adding a minus sign in the objective) and goes from there. If you put some code up somewhere I'd be happy to take a look.

If these cones are implemented as functions rather than atoms, is there a good way to add their constraints to the problem? I could pass unique_conic_forms as a parameter to the functions, but it seems this would prevent using them from outside the Convex.jl internals.

Ah ok, I think if that is an issue than atoms are indeed the way to go. I think I spoke too soon when suggesting functions instead.

As for the arrays, I don't think it's a problem (for now). Arrays don't seem to be used anywhere, at least that I've seen so far. I think they just exist as an option in case the user wants several variables constrained to the cone. As compared to, e.g., calling matrix_geo_mean_hypo_cone several times, the arrays allow sharing some of the intermediate variables (and so create a smaller problem).

Ok, that's good!

dstahlke commented 4 years ago

Wow, congrats on the thesis submission! Bet it feels good to be done with that. I've been saying "then I'll have some free time" for a long time now, and it never happens. Seems we always find a way to fill it. Thanks for spending a bit of your time helping me on this project. No hurry - this is not at all urgent.

I made a branch with the two functions I've added: https://github.com/dstahlke/Convex.jl/tree/logm

Regarding constraints vs functions, now that I look at src/constraints I think maybe it would be feasible. It would allow writing e.g. (A,B,T) in :geom_mean_hypocone which is an okay syntax. I'll experiment with that at some point.

Let me clarify my question on minimization. An example is in src/atoms/sdp_cone/matrixfrac.jl which has this:

p = minimize(t, [t x'; x P] ⪰ 0)
cache_conic_form!(unique_conic_forms, m, p)

What does minimize do in that context? Is the point of it just so there will be a DCP warning if I do something like this?

problem = maximize(matrixfrac([1;2;3], X))
solve!(problem, () -> SCS.Optimizer(verbose=0))

Or is it needed in order to warn Convex.jl that matrixfrac built a problem where t is unbounded from above? I'm in a similar situation with geom_mean_hypocone, which returns a matrix variable T which is unbounded from below. So it seems like I should do something like this:

cache_conic_form!(unique_conic_forms, atom, maximize(T))

But maximize doesn't allow a ComplexVariable argument (my intention would be to maximize in the operator ordering, aka the Loewner ordering). So instead I'm doing this:

cache_conic_form!(unique_conic_forms, atom, conic_form!(T, unique_conic_forms))

Is that okay?

Another question: I'm getting DCP compliance errors from geom_mean_hypocone. As an example, the unit test sdp_geom_mean_hypocone_real_1_2 produces this:

subject to
└─ sdp constraint (Convex.NotDcp)
   └─ + (concave; real)
      ├─ geom_mean_hypocone (concave; real)
      │  ├─ 4×4 Array{Float64,2}
      │  └─ 4×4 real variable (id: 524…219)
      └─ - (constant; negative)
         └─ 4×4 Array{Float64,2}

It seems to me that the sdp constraint should be okay. I notice in the code vexity(c::SDPConstraint) is more restrictive than function vexity(c::GtConstraint), the former requiring the argument to be affine. I think "concave in sdp" should be okay since it would then restrict the arguments to be in a convex set. But I'm new to the DCP rules and haven't seen anything spelled out for the semidefinite case. Maybe DCP rules are only for scalars? It seems for non-scalars the definitions of concave and convex would have to be relative to a cone: convex means f(tx+(1-t)y) <= tf(x) + (1-t)f(y), but for non-scalars the <= operator only has meaning relative to some cone.

ericphanson commented 4 years ago

Wow, congrats on the thesis submission! Bet it feels good to be done with that.

Thanks! It sure does :).

I made a branch with the two functions I've added: https://github.com/dstahlke/Convex.jl/tree/logm

Glad to see you got the test syntax worked out; the problem depot bit adds some boilerplate to the tests, but it has been useful.

Regarding constraints vs functions, now that I look at src/constraints I think maybe it would be feasible. It would allow writing e.g. (A,B,T) in :geom_mean_hypocone which is an okay syntax. I'll experiment with that at some point.

I think conceptually, membership in a cone does belong as a constraint rather than an atom. The PSD cone gets the special syntax as well as other ways of constructing it; for these more specialized ones that doesn't make as much sense, but I think something like T in GeomMeanHypoCone(A, B, t) would work well. The way that would work is that GeomMeanHypoCone(A, B, t) could just be something like

struct GeomMeanHypoCone
    A
    B
    t
end

as well as a

struct GeomMeanHypoConeConstraint <: Constraint
 ...
end

and then have a method

in(T::AbstractExpr, C::GeomMeanHypoCone) = GeomMeanHypoConeConstraint(T, C)

So the GeomMeanHypoCone would just hold onto A, B, and t, and be used for dispatch, while the GeomMeanHypoConeConstraint would be the actual constraint constructed from those (which thus has access to the variable T as well as A, B, and t) that implements a conic_form! etc.

Looking at the atom MatrixGeoMeanHypoCone, I'm not sure it's doing what we want it to do. The T that gets constructed inside the conic_form! never gets passed out, so I'm not sure if e.g. the line geom_mean_hypocone(A, B, 1//2) ⪰ eye(n) in the tests actually imposes the right constraint? I think the conic_form! for your MatrixGeoMeanHypoCone should become most of the conic form for GeomMeanHypoConeConstraint, with the crucial distinction that T is an input for GeomMeanHypoConeConstraint instead of being created inside its conic_form!.

Let me clarify my question on minimization. An example is in src/atoms/sdp_cone/matrixfrac.jl which has this:

p = minimize(t, [t x'; x P] ⪰ 0)
cache_conic_form!(unique_conic_forms, m, p)

What does minimize do in that context? Is the point of it just so there will be a DCP warning if I do something like this?

problem = maximize(matrixfrac([1;2;3], X))
solve!(problem, () -> SCS.Optimizer(verbose=0))

Or is it needed in order to warn Convex.jl that matrixfrac built a problem where t is unbounded from above?

I think this doesn't do either; it actually isn't related to the DCP checking. What this is doing is essentially http://cvxr.com/cvx/doc/advanced.html#new-functions-via-partially-specified-problems but without a nice user-interface. In other words, we are reusing that elsewhere we've already programmed how to encode SDP constraints, and we are saying to the way to do a matrixfrac is to minimize a single scalar variable t subject to an SDP constraint.

I'm in a similar situation with geom_mean_hypocone, which returns a matrix variable T which is unbounded from below. So it seems like I should do something like this:

cache_conic_form!(unique_conic_forms, atom, maximize(T))

But maximize doesn't allow a ComplexVariable argument (my intention would be to maximize in the operator ordering, aka the Loewner ordering). So instead I'm doing this:

cache_conic_form!(unique_conic_forms, atom, conic_form!(T, unique_conic_forms))

Is that okay?

I'm not quite sure what you mean but maybe my previous answer helps?

Another question: I'm getting DCP compliance errors from geom_mean_hypocone. As an example, the unit test sdp_geom_mean_hypocone_real_1_2 produces this:

subject to
└─ sdp constraint (Convex.NotDcp)
   └─ + (concave; real)
      ├─ geom_mean_hypocone (concave; real)
      │  ├─ 4×4 Array{Float64,2}
      │  └─ 4×4 real variable (id: 524…219)
      └─ - (constant; negative)
         └─ 4×4 Array{Float64,2}

It seems to me that the sdp constraint should be okay. I notice in the code vexity(c::SDPConstraint) is more restrictive than function vexity(c::GtConstraint), the former requiring the argument to be affine. I think "concave in sdp" should be okay since it would then restrict the arguments to be in a convex set. But I'm new to the DCP rules and haven't seen anything spelled out for the semidefinite case. Maybe DCP rules are only for scalars? It seems for non-scalars the definitions of concave and convex would have to be relative to a cone: convex means f(tx+(1-t)y) <= tf(x) + (1-t)f(y), but for non-scalars the <= operator only has meaning relative to some cone.

I also haven't seen the DCP rules spelled out in this case, but I think you are right that "concave in SDP" should be okay, and that the definition of convex and concave is with respect to the cone. Section 4.6 of Boyd and Vandenberghe agrees with that, although that's not about DCP specifically, but convex programming in general.

It's possible that the DCP warnings need tweaking for this case. However as I mentioned above I'm not sure the geom_mean_hypocone function is doing the right thing here, so it could be that.

dstahlke commented 4 years ago

So, I experimented a bit with making GeomMeanHypoCone a constraint rather than a function: https://github.com/dstahlke/Convex.jl/tree/logm-constraint ... and I'm stuck right in the middle of indecisiveness about whether this is better.

On the one hand, geometric mean is a function, not a cone. It just happens to be implemented in this case as a hypograph. But actually that's true of every single thing in src/atoms/sdp_cone: they are all epigraphs or hypographs. Really the only different thing about my function is that it returns a matrix rather than a scalar. So for that reason it should probably be a function.

On the other hand, the DCP system seems unequipped to deal with operators. I guess the point of a concave function, as far as DCP is concerned, is that if you constrain it >= 0 then the feasible region is convex. This is a problem in my case because my function is operator concave rather than entrywise concave. So it needs to be ⪰ 0 rather than >= 0.

One solution would be to add OperatorConcave and OperatorConvex vexity types. I guess this would be a major change. But maybe not too bad - it looks like function vexity is only used in about a dozen places. For example, x ⪰ 0 would require x to be AffineVexity or OperatorConcave, analogous to how >= works.

If GeomMeanHypoCone were to be implemented as a constraint, I think the right syntax might be T ⪯ GeomMeanHypograph(A, B, t). This would remind the user which direction the constraint points. And similarly T ⪰ GeomMeanEpigraph(A, B, t). The only issue I can see with this is it might tempt the user to try writing something like T ⪰ GeomMeanEpigraph(A, B, t) + U which would give a type error on the + function.

When I tried implementing this as a constraint I couldn't really figure out what should be passed into cache_conic_form!. For constraints this function appears to need a ConicConstr object, and in all existing examples that's fed into the guts of the underlying solver. I managed to bypass this by using

cache_conic_form!(unique_conic_forms, constraint, Array{Convex.ConicConstr,1}())

and it seems to work. Don't know if that's kosher though.

ericphanson commented 4 years ago

So, I experimented a bit with making GeomMeanHypoCone a constraint rather than a function: https://github.com/dstahlke/Convex.jl/tree/logm-constraint ... and I'm stuck right in the middle of indecisiveness about whether this is better.

On the one hand, geometric mean is a function, not a cone. It just happens to be implemented in this case as a hypograph. But actually that's true of every single thing in src/atoms/sdp_cone: they are all epigraphs or hypographs. Really the only different thing about my function is that it returns a matrix rather than a scalar. So for that reason it should probably be a function.

I see what you mean, and I agree it seems like it might be more useful to have it as a function than a constraint. However, what I'm not sure about is how to get actually get the function version. It wasn't clear to me that in the atom version it actually gives the right thing, but I am a bit confused.

I looked at the Fawzi Saunderson paper (https://arxiv.org/abs/1512.03401) and they only discuss the epi- and hypo- graph formulations for the matrix geometric mean and then use them for scalar-valued functions.

On the other hand, the DCP system seems unequipped to deal with operators. I guess the point of a concave function, as far as DCP is concerned, is that if you constrain it >= 0 then the feasible region is convex. This is a problem in my case because my function is operator concave rather than entrywise concave. So it needs to be ⪰ 0 rather than >= 0.

One solution would be to add OperatorConcave and OperatorConvex vexity types. I guess this would be a major change. But maybe not too bad - it looks like function vexity is only used in about a dozen places. For example, x ⪰ 0 would require x to be AffineVexity or OperatorConcave, analogous to how >= works.

Yes, we could add these. We would need to encode all the rules for compositions, and have analogs for increasing / decreasing (i.e. operator monotone), etc. And as far as I know, this hasn't been done in other systems (cvxpy, CVX, etc). That's not a reason to not do it, but it makes me wonder if it ends up not being useful. Worth thinking about for sure. If you haven't seen it, it's worth looking at the original CVX paper (https://stanford.edu/~boyd/papers/disc_cvx_prog.html) where they discuss all the rules for the usual notions of convexity and concavity.

If GeomMeanHypoCone were to be implemented as a constraint, I think the right syntax might be T ⪯ GeomMeanHypograph(A, B, t). This would remind the user which direction the constraint points. And similarly T ⪰ GeomMeanEpigraph(A, B, t). The only issue I can see with this is it might tempt the user to try writing something like T ⪰ GeomMeanEpigraph(A, B, t) + U which would give a type error on the + function.

Hmm, that could be a suggestive syntax. I imagine these functions would tend to be used in implementations for other functions or by advanced users, so to me the T in cone syntax seems okay and is also still quite clear. We could add a docstring to eg GeomMeanEpigraph that spells it out so one doesn't have to remember which way it goes.

When I tried implementing this as a constraint I couldn't really figure out what should be passed into cache_conic_form!. For constraints this function appears to need a ConicConstr object, and in all existing examples that's fed into the guts of the underlying solver. I managed to bypass this by using

cache_conic_form!(unique_conic_forms, constraint, Array{Convex.ConicConstr,1}())

and it seems to work. Don't know if that's kosher though.

Uhh, very good question. I haven't worked on the constraint stuff too much, but I think I see how this works, and that what you have is fine.

How this works is as follows. ConicConstr objects represent what I'll call "primitive constraints": constraints that we pass along to MathOptInterface, such as x >= 0, which we do not do any further "bridging" ourselves. For example, if you look at our SDP constraints, it's a bit complicated because we support complex numbers and MOI doesn't. But we ultimately generate real SDP constraints that we ask MOI to implement, via a ConicConstr object with the symbol SDP. In this case, we could do the same, and pass out ConicConstr objects with the primitive constraints we end up using (again, SDP constraints), but then we would have to reimplement handling complex numbers etc. Instead, by generating SDPConstraint objects, those do the final creating of the ConicConstrs, and we have no primitive constraints to pass along ourselves.

In other words, it's the mechanism that Convex uses to pass primitive constraints to MOI, but here we are purely bridging so we don't need any, and an empty array is fine. Looking at it now, this fact is actually why it's possible at all to implement the cone constraint as an atom (atoms, I believe, don't get to pass on ConicConstr objects so they can't implement primitive constraints, but we don't have any, so it's ok).

dstahlke commented 3 years ago

Just a quick update: I have this mostly complete, but keep getting sidetracked with higher priority tasks. In case anybody coming across this thread needs this functionality: let me know and I'll send you my latest.

dstahlke commented 3 years ago

First attempt at porting cvxquad: #418

ericphanson commented 6 months ago

I think we can close this; the original question,

Tr(Xlog(X/A) - X + A)

assuming that means

Tr(Xlog(X) - Xlog(A) - X + A)

can be implemented as

quantum_relative_entropy(X, A) - tr(X) + tr(A)

thanks to #418