rchen234 / SIP_LAG

5 stars 1 forks source link

Is $Q^*_s$ lower semicontinuous? Or is it continuous actually? #2

Closed WalterMadelim closed 9 months ago

WalterMadelim commented 9 months ago

At the end of Proof of Proposition 1, you mentioned that $Q^\text*_s$ is lower semicontinuous by theorem 7.1 of Rockafellar (1970). I have no idea how it comes.

According to eq(12), the definition of $Q^\text_s$, it should be a concave function (is it?). Then it should be upper semicontinous (is it?). Then according to your conclusion (lsc), $Q^\text_s$ is continuous (is it?).

Could you explain in a bit more explicit way? Thanks.

WalterMadelim commented 9 months ago

image

WalterMadelim commented 9 months ago

Actually you use this assumption to complete your proof

But is this point a relative interior point of dom($Q$)? According to my knowledge, I only know a proposition like this (e.g., proposition 1.3.11 in convex optimization theory, Bertsekas):

p.s. It's not a good experience to use latex in markdown. So I hope that I've made myself clear.

rchen234 commented 9 months ago

Hi @WalterMadelim

Thanks for the question! Yes, I guess you are right and it is concave and upper semicontinuous. I probably made a mistake when first writing that. But you can still argue it is lower semi-continuous (and continuous indeed) using theorem 7.1 of Rockafellar (1970) as follows:

  1. conv($K^s$) is polyhedral and therefore you are only taking min over finite number of (x,y) (extreme points of $K^s$) in equation (12).
  2. Therefore, the epigraph is a union of finite close sets which is close.

Let me know if it doesn't make sense to you.

Thanks, Rui

WalterMadelim commented 9 months ago

Yes, that's what I'm confused about: an arbitrary union of closed set might not be closed. This explanation is reasonable.

WalterMadelim commented 9 months ago

I think the last "≥" in Proof of Proposition 1 is actually a "=". And Rockafellar uses "=" to define lsc (lower semicontinuous) in his book "Variational Analysis", although in other books like real analysis, lsc is defined by an "≥", as it is in your proof here. Do you have some viewpoint on this?

And the penultimate "≥" at the same place, it seems to depend on the condition "(x, $\theta_s$) ∈ $\bar P_s$ $\implies$ (x, $\theta_s$) ∈ $E^s$". Am I right?

By the way, I have a question: did you intended to drop the " $\in R_+$ " condition on $u,v$ in the Lagrangian feasibility cut problem, which is the equation between (10) and (11). Are there any benefit to do this? If you drop this, I think there is a circumstance, e.g., the closest "≥" sign to (11) (that is, the (11) sign itself, on the published paper page 2335) would be a strict ">", both sides being $\in R$.

I think this is a very logical and constructive paper. It's a pleasure to read your paper and learning new ideas. Thanks for your contributions on this field and your response. (Although I'm not a reviewer 🥲)

(But this may be not an end, I may still have questions.)

Best wishes.

rchen234 commented 9 months ago

Hi @WalterMadelim

I am not sure what you mean by $E^s$, but we were basically proving what is necessary I think.

I don't remember why I drop the nonnegative signs for $u$ and $v$ for the feasibility cuts. But I still think it would be better to do so because we want to show that our normalized cuts are not weaker than those cuts (even when $u$ and $v$ are not restricted to be nonnegative).

Thanks for the kind words. I am glad that you like it!

Best, Rui

WalterMadelim commented 9 months ago

I am not sure what you mean by $E^s$, but we were basically proving what is necessary I think.

This was my fault, I misunderstood your proof.

for the feasibility cuts

I agree that it's okay to drop >= 0 restrictions at present.

I have a new question here.

This is because, take the sslp problem for instance, which is with 1st stage pure binary, second stage mixed binary, I find that adding all the Lagrangian cuts to the root node is already enough to give the final optimal solution. To be more specific, after training those $\hat Q_s(x)$'s, we build a master problem with objective like c[1]x[1]+c[2]x[2]+c[3]x[3] + p[1]θ[1] + ... + p[5]θ[5] # 1st decision x in {0,1}^3, 5 scenarios in total and trained cuts added, then we directly get a integer solution $x$ such that $\theta_s$ solutions is already those true recourse function values (within an allowed error), e.g.,

Q: -472.0, Q̂: -471.99999999999994, Q̂-Q: 5.684341886080802e-14, Allowed Error: 5.190674941232628e-6
... (omit 48 lines simiar)
Q: -597.0, Q̂: -596.9999999999999, Q̂-Q: 1.1368683772161603e-13, Allowed Error: 5.190674941232628e-6

Thus we do not need to add lazy cuts any more.

The result above is from my experiment https://github.com/WalterMadelim/p8f.jl/blob/main/src1/s.jl, thanks to your theories, and the data from https://github.com/rchen234/SIP_LAG/blob/main/Instances/sslp/sslp1_30_70_50.txt. Training process is probably 2 hours, with Gurobi, which I think is very pleasant.

rchen234 commented 9 months ago

Hi @WalterMadelim

In theory, $z{D}$ can be quite weaker than $z{IP}$ in the worst case. See the paper by Dey et al. (since $zD$ can be obtained by adding all scenario cuts.) But what we observed (and they reported in their paper) is that in general the gap is very small. I think it is an interesting research question why this is the case for most stochastic MILPs. I believe the structure of the problem (e.g., submodularity of the recourse functions) matters a lot. In general, I believe more scenarios and more first-stage variables you have, more likely the recourse functions of different scenarios are somehow different from each other and therefore $z{D}$ can be weaker than $z_{IP}$. So if you try larger SSLP instances, maybe you will observe a nonzero (but still small) gap.

Here is an instance with 2 scenarios and 2 first-stage binary variables with nonzero gap:

Jim and I also thought about learning the cuts somehow, but we didn't actually try it. It is good to see some work in that direction! Let me know if you have other questions.

Best, Rui

WalterMadelim commented 9 months ago

the paper by Dey et al.

I’ll check that.

try larger SSLP instances.

Yes, we need to deal with real world large scale practical problems.

Your example is ingenious,

min 0+0.5z+0.5w s.t. z >= y-x z >= x-y w >= x+y-1 w >= 1-x-y 0<=x<=1 0<=y<=1

minobjval = 0 when x=y=0.5

Thanks for your superb replies, I’m looking forward to see more advanced research results from yours :)

WalterMadelim commented 6 months ago

Hi, @rchen234 . Long time no see. I have an interesting discovery and would like to share it with you.

The impressive normalized lagrangian cut (13) and "Algorithm 1" in your paper proves to be pretty robust in applications. You've mentioned in your paper that you would like to integrate your method in SDDiP.

But I would like to comment from my perspective that I'm kind of object to the binarization preprocessing required by SDDiP, i.e., you need to make sure that all chaining state variables are 0/1 bits. Because I think it's too complicated and error-prone, if the original problem does not have binary state variables.

The non-convex SLDP method (ref. [1]) can circumvent this issue. And it uses cuts with a reverse-norm term such that the cuts can penetrate non-convex dents in value functions.

Inspired by your " $\pi, \pi_0$ " idea (see Lag-cut (13) in your paper), I propose a cut of the form $\pi_0 \theta + \pi_0 \rho ||x - \bar x|| + \pi^\top x \ge rhs$, where, $(x, \theta)$ is as the meaning in (13). $rhs$ is a variant from the RHS term in (13). $\rho$ is an augmenting coefficient, and $\bar x$ is a center-point where the reverse norm cut is set. The standard form of this cut is eq.(12) in reference [2].

Here is the experiment, I establish a 3-stage, 1-scenario nonconvex MIP, which is derived from the discrete control problem in reference [1]. And I use Lagrangian cuts to train the value functions in the first phase, with the gap reduced from 100% to 28% until no more Lagrangian cuts can be added. Then I use the nonconvex cut proposed above to continue training, This time the gap can be reduced to <= 0.01%, which indicates the practicality of this method (although it's a toy example). code_is_here

Here is a sketch of what I'm doing: the SE subplot is $Q_3(x_2)$, the olive curve in the NE subplot is $Q_2(x_1)$, the blue curve in the NE subplot is the 1st-stage immediate cost of $x_1$. And I devise this deliberately such that Lagrangian cuts cannot close this gap. The SW subplot is precisely the minimization problem $v^{prim} (x_1)$. The NW subplot is merely a local magnification of the SW, and the optimal point is illustrated with a small cyan diamond. The code above find this cyan diamond successfully.

}EG@G~0H_G1GKFE @3 `D@6

So what's your opinion on this? Do you think SDDiP is a good way, or the more advanced SLDP method could do better? Would you like to extend your " $\pi, \pi_0$ " cut method to the augmented Lagrangian cut (in [1] or [2])?

Thanks!

[1] Ahmed, S., Cabral, F.G. & Freitas Paulo da Costa, B. Stochastic Lipschitz dynamic programming. Math. Program. 191, 755–793 (2022). [2] Zhang, S., Sun, X.A. Stochastic dual dynamic programming for multistage stochastic mixed-integer nonlinear optimization. Math. Program. 196, 935–985 (2022).

rchen234 commented 6 months ago

Hi @WalterMadelim , thanks for sharing! Honestly, I am not sure if I have the authority to judge between SDDiP and SLDP as I haven't actually worked on multistage stochastic integer programming yet. But your results do look promising! I am absolutely delighted to see research inspired by our work. Please keep me posted :)