kabkabm / defensegan

Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models (published in ICLR2018)
Apache License 2.0
229 stars 62 forks source link

Whitebox attack not working? #1

Closed kwonchungli closed 6 years ago

kwonchungli commented 6 years ago

Hi, I was running your code (so neat and nice), but I think fgsm attack seems not working in whitebox setting.

I ran separately adv_x to see how it looks like, and it was very clean. I checked the gradient of model.get_preds(images_pl) with respect to images_pl and it was all zero.

Am I doing something wrong?

po0ya commented 6 years ago

Can you explain more about the parameters that you run the code with?

kwonchungli commented 6 years ago

I exactly followed the instructions you gave. There was no specific parameter I gave except --num_test 50 in order to see results quickly.

po0ya commented 6 years ago

Which part of the instruction? Whitebox attack with no defense or with defensegan? If --defense_type defense_gan is active, the gradients will be zero because of gradient vanishing. There are many sigmoids in the update iterations. Try running it without any defense and see if it gives zeros.

kwonchungli commented 6 years ago

I tried whitebox attack with defense_gan, since I wanted to see both adversarial and then reconstructed images. I guess once a reconstruction layer is added to model, gradient disappears. But attack has to work when I test defense GAN method. Is that right?

po0ya commented 6 years ago

Oh I see, yes the attack won't work because it goes through the reconstruction step too. So probably before this line https://github.com/kabkabm/defensegan/blob/ed9eb6ec2f0e7fec58663fde8ef4f204a260e910/whitebox.py#L185 you need to do something like:

attack_obj = FastGradientMethod(model, sess=sess)
adv_x_tr = attack_obj.generate(images_pl, **attack_params)
reconstruction = gan.reconstruct(adv_x_tr)

and then see the reconstruction.

po0ya commented 6 years ago

Please open new issues if there are any follow ups.