sisl / NeuralVerification.jl

Methods to soundly verify deep neural networks
MIT License
223 stars 50 forks source link

Inconsistent Results between Reluplex and NSVerify #107

Closed castrong closed 4 years ago

castrong commented 4 years ago

I'm seeing an inconsistent results between Reluplex and NSVerify, both complete solvers, on the same problem. I believe that NSVerify is correct since the counter-example it returns back is a valid counterexample (shown in the code below). Code is below, with the .nnet file given at the bottom. To replicate this issue copy the network into a file and set the filename in the first line to point to that file.

network = NeuralVerification.read_nnet("./test_rand/networks/rand_[1,3,1]_2.nnet")
input_set = NeuralVerification.Hyperrectangle([0.3791619134008495], [0.10069447579678581])
output_set = NeuralVerification.PolytopeComplement(NeuralVerification.HPolytope([NeuralVerification.HalfSpace([1.0], 0.535032787547191), NeuralVerification.HalfSpace([-1.0], 0.8711752040844563)]))
problem = NeuralVerification.Problem(network, input_set, output_set)

# Solve on Reluplex and NSVerify
println(NeuralVerification.solve(NeuralVerification.Reluplex(), problem))
println(NeuralVerification.solve(NeuralVerification.NSVerify(), problem))

# Test the counterexample
println("Counter example is outside set: ", NeuralVerification.compute_output(network, [0.2784674376040637]) ∈ output_set.P) # It's in 

CounterExampleResult(:holds, Float64[]) CounterExampleResult(:violated, [0.2784674376040637]) Counter example is outside set: true

.nnet file pasted in:

//// Neural Network File Format by Kyle Julian, Stanford 2016 2, 1, 1, 3, 1, 3, 1, This line extraneous -6.55e4, 6.55e4, 0.0,0.0, 1.0,1.0, -0.2092992410225305, 0.1527785697079942, 0.6952328858495411, -0.12896309655879667, 0.13005202010414774, -0.34826001136290063, 0.004074797378236905, 0.8952987002088535, -0.07894098008654638, -0.8416186860567856,

AvrahamRaviv commented 4 years ago

Hi I'd tackled the same problem. Since NSVerify returns a counterexample, it should be a bug in Reluplex's implementation. BTW I've cases that Reluplex returns a counterexample but NSVerify returns hold. If you find an answer for this Inconsistent - let me know, please.

castrong commented 4 years ago

Will do - do you still have access to those examples where Reluplex returns a counterexample but NSVerify returns hold? If so would you be able to share such an example?

Thanks!

AvrahamRaviv commented 4 years ago

Sorry, my bad, all of my Inconsistent examples are while NSVerify returns a counterexample and Reluplex returns hold.

AvrahamRaviv commented 4 years ago

Does anyone have any idea? Thanks

tomerarnon commented 4 years ago

@AvrahamRaviv could you check if your problem persists on master? I believe there may have a been a bug fix in Reluplex that hasn't made its way into a tagged release yet.

@castrong you may or may not have tested this on master already? As the one who patched that original bug, have you seen anything suspicious that could lead to this?

AvrahamRaviv commented 4 years ago

Hi @tomerarnon thanks for your answer. Due to 98#, I couldn't rebase to master. I'll try to rebase asap and let you know.

castrong commented 4 years ago

I've just replicated on master (vs. the branch where I had originally found it) and am looking into it more deeply now. I'll let you know if I find something suspicious.

AvrahamRaviv commented 4 years ago

@AvrahamRaviv could you check if your problem persists on master? I believe there may have a been a bug fix in Reluplex that hasn't made its way into a tagged release yet.

@castrong you may or may not have tested this on master already? As the one who patched that original bug, have you seen anything suspicious that could lead to this?

@tomerarnon I've rebased to master and still got inconsistent results. image

castrong commented 4 years ago

@AvrahamRaviv can you try replacing the encode function in Reluplex with the following? This works on my end (the bounds are currently incorrectly being applied to the pre-activation instead of the post-activation values).

function encode(solver::Reluplex, model::Model,  problem::Problem)
    layers = problem.network.layers
    ẑ = init_neurons(model, layers) # before activation
    z = init_neurons(model, layers) # after activation

    # Each layer has an input set constraint associated with it based on the bounds.
    # Additionally, consective variables zᵢ, ẑᵢ₊₁ are related by a constraint given
    # by the affine map encoded in the layer Lᵢ.
    # Finally, the before-activation-variables and after-activation-variables are
    # related by the activation function. Since the input layer has no activation,
    # its variables are related implicitly by identity.
    activation_constraint!(model, ẑ[1], z[1], Id())
    bounds = get_bounds(problem)
    for (i, L) in enumerate(layers)
        @constraint(model, affine_map(L, z[i]) .== ẑ[i+1])
        add_set_constraint!(model, bounds[i], z[i])
        activation_constraint!(model, ẑ[i+1], z[i+1], L.activation)
    end
    # Add the bounds on your output layer
    add_set_constraint!(model, last(bounds), last(z))
    # Add the complementary set defind as part of the problem
    add_complementary_set_constraint!(model, problem.output, last(z))
    feasibility_problem!(model)
    return ẑ, z
end
AvrahamRaviv commented 4 years ago

@castrong Well, I have 9 nets. Before your change: NSVerify - 7 counterexamples and 2 holds, Reluplex - 4 counterexamples and 5 holds. Now, after your change, Reluplex returns a counterexample for all 9 nets, then I've 2 cases that Reluplex return violated and NSVerify return hold. Still inconsistency, but from the other side. Avraham

AvrahamRaviv commented 4 years ago

Hi. One can find my "problematic" nets here. The code is the same to ImageClassification.ipynb As I said, after @castrong patch, Reluplex return a counterexample for both bets and NSVerify return holds.

tomerarnon commented 4 years ago

Thanks @AvrahamRaviv, we'll look into it with those networks!

castrong commented 4 years ago

I replicated locally and then tried with a larger big-M value for NSVerify and that seemed to fix it. @AvrahamRaviv can you see if that fixes it for you? I tried NSVerify(m=10000) instead of NSVerify(m=100). The MIP encoding used in NSVerify will become incorrect if the m value isn't larger than any possible activation in the network.

I've added an issue #118 to discuss whether we should automatically set the big M value in some other way or if we should issue a warning if we notice that the chosen big-M value is too small for the given problem.

AvrahamRaviv commented 4 years ago

@castrong your comment actually fixed my problem! Just let you know that I not used m=100 but set the default m (m=1000) and got holds. With 10000 NSVerify return a counterexample, as you said. Thanks!

castrong commented 4 years ago

Okay great to hear!