Incorrect Infeasibility Status for user extension to MIPVerify code

Ahmed-anwar commented 1 year ago

Hello Everyone,

I am trying to use MIPVerify library for neural network verification against adversarial attacks. My goal is to replace the ReLU function by a trainable mask that depends on the input image. Therefore for layer $i$, the activation of this layer instead of $max(0, w_i^\top \cdot xi)$ is calculated by $\mathbb{I}{(w_m^\top \cdot x_0) > 0} \cdot (w_i^\top \cdot x_i)$ where $w_m$ is the weight of independent network $m$ that is trained to predict the behavior of the ReLU function at layer $i$ and $x_0$ is the original input.

function p_relu(x::T, x_s::T)::T where {T<:Real}
    if x_s > 0
        return x
    end
    return zero(T)
end

function p_relu(x::AbstractArray{T}, x_s::AbstractArray{T}) where {T<:Real}
    return p_relu.(x, x_s)
end

function p_relu(x::T, l_n::Real, u_n::Real, x_s::T, l_s::Real, u_s::Real)::JuMP.AffExpr where {T<:JuMPLinearType}
    if u_s < l_s
        # TODO (vtjeng): This check is in place in case of numerical error in the calculation of bounds.
        # See sample number 4872 (1-indexed) when verified on the lp0.4 network.
        Memento.warn(
            MIPVerify.LOGGER,
            "Inconsistent upper and lower bounds: u-l = $(u - l) is negative. Attempting to use interval arithmetic bounds instead ...",
        )
        u_s = upper_bound(x_s)
        l_s = lower_bound(x_s)
    end

    if u_s < l_s
        error(
        MIPVerify.LOGGER,
        "Inconsistent upper and lower bounds even after using only interval arithmetic: u-l = $(u - l) is negative",
        )
    elseif u_s <= 0
        # rectified value is always 0
        return zero(T)
    elseif l_s >= 0
        # rectified value is always 1
        return x
    else
        # since we know that u!=l, x is not constant, and thus x must have an associated model
        model = owner_model(x)
        x_rect = @variable(model)
        a = @variable(model, binary = true)

        # refined big-M formulation that takes advantage of the knowledge
        # that lower and upper bounds  are different.

        @constraint(model, x_s <= u_s * a)
        @constraint(model, x_s >= l_s * (1-a)

        @constraint(model, x_rect == x * a)

        # Manually set the bounds for x_rect so they can be used by downstream operations.

        if l_n > 0
            set_lower_bound(x_rect, 0)
        else
            set_lower_bound(x_rect, l_n)
        end
        if u_n > 0
            set_upper_bound(x_rect, u_n)
        else
            set_upper_bound(x_rect, 0)
        end
        return x_rect
    end
end

function p_relu(x::JuMPLinearType, x_s::JuMPLinearType)::JuMP.AffExpr
    u_s = tight_upperbound(x_s, cutoff = 0)
    l_s = lazy_tight_lowerbound(x_s, u_s, cutoff = 0)

    u = tight_upperbound(x, cutoff = 0)
    l = lazy_tight_lowerbound(x, u, cutoff = 0)
    p_relu(x, l, u, x_s, l_s, u_s)
end

function p_relu(
    x::AbstractArray{T},
    x_s::AbstractArray{T};
    nta::Union{TighteningAlgorithm,Nothing} = nothing,
)::Array{JuMP.AffExpr} where {T<:JuMPLinearType}
    show_progress_bar::Bool =
        MIPVerify.LOGGER.levels[MIPVerify.LOGGER.level] > MIPVerify.LOGGER.levels["debug"]
    if !show_progress_bar
        u_s = tight_upperbound.(x_s, nta = nta, cutoff = 0)
        l_s = lazy_tight_lowerbound.(x_s, u_s, nta = nta, cutoff = 0)

        u = tight_upperbound.(x, nta = nta, cutoff = 0)
        l = lazy_tight_lowerbound.(x, u, nta = nta, cutoff = 0)
        return p_relu.(x, l, u, x_s, l_s, u_s)
    else
        p1_s = Progress(length(x_s), desc = "  Calculating upper bounds shallow: ")
        u_s = map(x_i -> (next!(p1_s); tight_upperbound(x_i, nta = nta, cutoff = 0)), x_s)
        p2_s = Progress(length(x_s), desc = "  Calculating lower bounds shallow: ")
        l_s = map(v -> (next!(p2_s); lazy_tight_lowerbound(v..., nta = nta, cutoff = 0)), zip(x_s, u_s))
        reluinfo = ReLUInfo(l_s, u_s)
        Memento.info(MIPVerify.LOGGER, "$reluinfo")

        p1 = Progress(length(x), desc = "  Calculating upper bounds: ")
        u = map(x_i -> (next!(p1); tight_upperbound(x_i, nta = nta, cutoff = 0)), x)
        p2 = Progress(length(x), desc = "  Calculating lower bounds: ")
        l = map(v -> (next!(p2); lazy_tight_lowerbound(v..., nta = nta, cutoff = 0)), zip(x, u))

        # reluinfo = ReLUInfo(l, u)
        # Memento.info(MIPVerify.LOGGER, "$reluinfo")

        p3 = Progress(length(x), desc = "  Imposing p_relu constraint: ")
        return x_r = map(v -> (next!(p3); p_relu(v...)), zip(x, l, u, x_s, l_s, u_s))
    end
end

I followed the same approach of the code in implementing the ReLU function and I only changed the base case and the constraints. Furthermore, I used two different methods to train the masking network, one using ReLU activation and one with Sigmoid:

@tf.custom_gradient
def relu(ys):
    a = tf.cast(ys > 0, tf.float32)
    def grad(dy):
        return dy * a
    return a, grad

@tf.custom_gradient
def sigmoid(ys):
    a = tf.cast(ys > 0, tf.float32)
    def grad(dy):
        sigmoid_x = tf.math.sigmoid(ys)
        return dy * sigmoid_x * (1 - sigmoid_x)
    return a, grad

However, the models trained using ReLU activation are always INFEASIBLE when I try to find adversarial examples using $L_\infty$ norm even with large $\varepsilon$ values such as $0.5$ or even $0.9$. The models trained with sigmoid activation do not exhibit the same behavior. And using PGD I am able to find adversarial examples. I would like to know if my formulation of the constraints is correct regarding my goal and what would be the reason for this behavior.

vtjeng commented 1 year ago

Thanks for sharing the details of your code. I've amended the title to reflect that you're seeing an incorrect infeasibility status when attempting to extend this code, rather than in the core library.

I've reproduced the key function of interest below. (The first two are obviously correct since we're dealing with reals). I've also removed the original comments and added comments of my own inline.

function p_relu(x::T, l_n::Real, u_n::Real, x_s::T, l_s::Real, u_s::Real)::JuMP.AffExpr where {T<:JuMPLinearType}
    if u_s < l_s
        Memento.warn(
            MIPVerify.LOGGER,
            "Inconsistent upper and lower bounds: u-l = $(u - l) is negative. Attempting to use interval arithmetic bounds instead ...",
        )
        u_s = upper_bound(x_s)
        l_s = lower_bound(x_s)
    end

    if u_s < l_s
        error(
        MIPVerify.LOGGER,
        "Inconsistent upper and lower bounds even after using only interval arithmetic: u-l = $(u - l) is negative",
        )
    elseif u_s <= 0
        # rectified value is always 0 as indicator variable is 0
        return zero(T)
    elseif l_s >= 0
        # rectified value is always x as indicatr variable s 1
        return x
    else
        model = owner_model(x)
        x_rect = @variable(model)
        a = @variable(model, binary = true)

        @constraint(model, x_s <= u_s * a)
        # you appear to be missing a ')' here. perhaps a copy-and-paste error?
        # otherwise I wouldn't expect your original code to compile...
        @constraint(model, x_s >= l_s * (1-a)

        # as far as I can tell, you're getting a (MI)QP here, rather than the MILP
        # I am not certain but it is possible that I've made an assumption somewhere
        # that you're working with LPs rather than QPs. 
        # If you'd like to keep it as an MILP, you'd have to use the trick here: https://docs.mosek.com/modeling-cookbook/mio.html#indicator-constraints
        # there might also be better approaches that don't use `a` directly, e.g.
        # http://www.inf.u-szeged.hu/~boglarka/AdvOpt/Modeling_tricks_2.pdf
        # I haven't given this careful thought so both approaches might be equivalent
        @constraint(model, x_rect == x * a)

        # Manually set the bounds for x_rect so they can be used by downstream operations.

        if l_n > 0
            set_lower_bound(x_rect, 0)
        else
            set_lower_bound(x_rect, l_n)
        end
        if u_n > 0
            set_upper_bound(x_rect, u_n)
        else
            set_upper_bound(x_rect, 0)
        end
        return x_rect
    end
end

I expect that the sigmoid vs. relu piece of things is a red herring, but would be curious if you find something definitive.

Let me know what you find, and thanks for the report!

Ahmed-anwar commented 1 year ago

Thank you a lot for your fast response. Indeed there was a copy paste typo. I have further reformulated the quadratic constraint by removing the constraint:


@constraint(model, x_rect == x * a)

And adding the variable $d$ such that $x = x\textunderscore rect - d$ and using the following linear constraints:


d = @variable(model)
@constraint(model, x_rect - d == x)
@constraint(model, x_rect >= a * l_n)
@constraint(model, x_rect <= a * u_n)
@constraint(model, d >= (1-a) * l_n)
@constraint(model, d <= (1-a) * u_n)

However, this did not lead to a different result.

vtjeng commented 1 year ago

Your formulation might have an error. Substituting a = 0:

x_rect >= 0
x_rect <= 0
x_rect == x + d >= x + l_n
x_rect == x + d <= x + u_n

Since l_n and u_n can both be positive, this would be an infeasible set of constraints.

Ahmed-anwar commented 1 year ago

Thank you again for your answer, you are right I changed this into:

M = max(abs(l_n),abs(u_n))
d = @variable(model)
@constraint(model, x_rect - d == x)
@constraint(model, x_rect >= -a * M)
@constraint(model, x_rect <= a * M)
@constraint(model, d >= -(1-a) * M)
@constraint(model, d <= (1-a) * M)

I have also noticed that while computing the bounds for the maximum operation it is also infeasible and there is the warning thrown:

Unexpected solve status INFEASIBLE; using interval_arithmetic to obtain bound

I have then tried $M = 100000000$ and the bounds were calculated without the warning being thrown yet the problem was still infeasible.

vtjeng commented 1 year ago

I'm not quite sure I understand your approach, but here's a set of parameters that work. It might be helpful to discuss your formulation a colleague who has more experience with integer programming (I'm a novice) so that the resulting formulation is sharper.

x_rect = x + d

When a = 0, we want 

l_n <= x <= u_n
l_n+d <= x_rect <= u_n+d

Therefore, we need

l_n+d <= 0 --> d <= -l_n --> d <= (1-a)*(-l_n)
u_n+d >= 0 --> d >= -u_n --> d >= (1-a)*(-u_n)

vtjeng / MIPVerify.jl

Incorrect Infeasibility Status for user extension to MIPVerify code #128