Running targeted robustness

wu-haoze commented 3 years ago

Hi I'm writing to ask how hard it would be to use the tool for targeted attack. Concretely, instead of checking whether the correct label is always the maximum, I hope to check whether an adversarial label can be the maximum. Could you provide some pointers regarding which part of the code I should modify? Thanks a lot!

Andrew

haithamkhedr commented 3 years ago

Hi @anwu1219,

The tool actually checks the robustness by trying to find targeted attacks using all the incorrect labels. As shown below, you can find a targeted attack by setting out_idx to the adversarial target label and get rid of the for loop.

https://github.com/rcpsl/PeregriNN/blob/7974eaa8e0cc6dc7e98287c7c9d47d7cd79b0dce/peregriNN.py#L115-L122 I'd be happy to help if you have other questions so please let me know if you do.

Regards, Haitham

wu-haoze commented 3 years ago

Hi @haithamkhedr , thanks a lot for the prompt response! If I read this code correctly, it checks whether the out_idx can be greater than the original label, but does not check whether the out_idx is the max. https://github.com/rcpsl/PeregriNN/blob/7974eaa8e0cc6dc7e98287c7c9d47d7cd79b0dce/peregriNN.py#L52-L57 It seems these methods are also specific to un-targeted attack? https://github.com/rcpsl/PeregriNN/blob/7974eaa8e0cc6dc7e98287c7c9d47d7cd79b0dce/peregriNN.py#L28-L38 It'd be tremendously helpful if you could point out which methods require modification! Best regard, Andrew

haithamkhedr commented 3 years ago

You can change these lines

https://github.com/rcpsl/PeregriNN/blob/7974eaa8e0cc6dc7e98287c7c9d47d7cd79b0dce/peregriNN.py#L53-L57

to

for other_out_idx in [i for i in range(network.output_size) if i != out_idx]:
    A = np.zeros(network.output_size) 
    A[out_idx] = 1 #out_idx is label of the adversarial target 
    A[other_out_idx] = -1  #other_out_idx is every other output
    b = [eps] 
    solver.add_linear_constraints([A],solver.out_vars_names,b,GRB.GREATER_EQUAL)

Also, you can change the code in check_property and check_prop_samples to check that your adversarial target is the max, instead of just checking that any label is greater than the true target. Something like the following

def check_property(network, x, target):
     global adv_target #Assuming that your adversarial target is a global variable
     u = network.evaluate(x) 
     if(np.argmax(u) == adv_target): 
         # print("Potential CE succeeded") 
         return True 
     return False

Please let me know if this is helpful, or reach out with any other questions.

Haitham

wu-haoze commented 3 years ago

Hi, thanks a lot for the help! I was able to run some experiments thanks to your pointers. The results are largely consistent with some of the other solvers, but I did spot some inconsistency. Here is my implementation. https://github.com/anwu1219/PeregriNN/tree/tar-attack

You could run the instance using the following script: https://github.com/anwu1219/PeregriNN/blob/tar-attack/run_mnist_test.sh

I modified the input arguments to perigriNN.py so that it takes in the index of the test image, the perturbation radius, and the target label as input. Additionally, I added a counter-example in https://github.com/anwu1219/PeregriNN/tree/tar-attack/test The peregriNN.py first checks the sanity of the counter-example: https://github.com/anwu1219/PeregriNN/blob/f823fd2901255e4e887874ba85eeb4e393e50196/peregriNN.py#L99-L113

And then goes on to solve the problem. It seems that the solver did not find any counter-examples and prints out "unsat": https://github.com/anwu1219/PeregriNN/blob/f823fd2901255e4e887874ba85eeb4e393e50196/peregriNN.py#L138-L144

I think I followed the pointers and have tried to account for numerical errors. Maybe I'm missing something else?

haithamkhedr commented 3 years ago

I don't think you're missing anything. Can you retry with the latest solver.py on the master branch ?

rcpsl / PeregriNN

Running targeted robustness #1