Closed wu-haoze closed 2 years ago
Hi @anwu1219,
The tool actually checks the robustness by trying to find targeted attacks using all the incorrect labels. As shown below, you can find a targeted attack by setting out_idx
to the adversarial target label and get rid of the for loop.
https://github.com/rcpsl/PeregriNN/blob/7974eaa8e0cc6dc7e98287c7c9d47d7cd79b0dce/peregriNN.py#L115-L122 I'd be happy to help if you have other questions so please let me know if you do.
Regards, Haitham
Hi @haithamkhedr , thanks a lot for the prompt response! If I read this code correctly, it checks whether the out_idx can be greater than the original label, but does not check whether the out_idx is the max. https://github.com/rcpsl/PeregriNN/blob/7974eaa8e0cc6dc7e98287c7c9d47d7cd79b0dce/peregriNN.py#L52-L57 It seems these methods are also specific to un-targeted attack? https://github.com/rcpsl/PeregriNN/blob/7974eaa8e0cc6dc7e98287c7c9d47d7cd79b0dce/peregriNN.py#L28-L38 It'd be tremendously helpful if you could point out which methods require modification! Best regard, Andrew
You can change these lines
to
for other_out_idx in [i for i in range(network.output_size) if i != out_idx]:
A = np.zeros(network.output_size)
A[out_idx] = 1 #out_idx is label of the adversarial target
A[other_out_idx] = -1 #other_out_idx is every other output
b = [eps]
solver.add_linear_constraints([A],solver.out_vars_names,b,GRB.GREATER_EQUAL)
Also, you can change the code in check_property
and check_prop_samples
to check that your adversarial target is the max, instead of just checking that any label is greater than the true target. Something like the following
def check_property(network, x, target):
global adv_target #Assuming that your adversarial target is a global variable
u = network.evaluate(x)
if(np.argmax(u) == adv_target):
# print("Potential CE succeeded")
return True
return False
Please let me know if this is helpful, or reach out with any other questions.
Haitham
Hi, thanks a lot for the help! I was able to run some experiments thanks to your pointers. The results are largely consistent with some of the other solvers, but I did spot some inconsistency. Here is my implementation. https://github.com/anwu1219/PeregriNN/tree/tar-attack
You could run the instance using the following script: https://github.com/anwu1219/PeregriNN/blob/tar-attack/run_mnist_test.sh
I modified the input arguments to perigriNN.py so that it takes in the index of the test image, the perturbation radius, and the target label as input. Additionally, I added a counter-example in https://github.com/anwu1219/PeregriNN/tree/tar-attack/test The peregriNN.py first checks the sanity of the counter-example: https://github.com/anwu1219/PeregriNN/blob/f823fd2901255e4e887874ba85eeb4e393e50196/peregriNN.py#L99-L113
And then goes on to solve the problem. It seems that the solver did not find any counter-examples and prints out "unsat": https://github.com/anwu1219/PeregriNN/blob/f823fd2901255e4e887874ba85eeb4e393e50196/peregriNN.py#L138-L144
I think I followed the pointers and have tried to account for numerical errors. Maybe I'm missing something else?
I don't think you're missing anything. Can you retry with the latest solver.py
on the master branch ?
Hi I'm writing to ask how hard it would be to use the tool for targeted attack. Concretely, instead of checking whether the correct label is always the maximum, I hope to check whether an adversarial label can be the maximum. Could you provide some pointers regarding which part of the code I should modify? Thanks a lot!
Andrew