vtddggg / CAA

The implementation of our paper: Composite Adversarial Attacks (AAAI2021)
https://arxiv.org/abs/2012.05434
Apache License 2.0
30 stars 3 forks source link

Questions about complexity counting #1

Closed persistz closed 3 years ago

persistz commented 3 years ago

In the paper, authors mentioned the complexity of CAA is 800, when using [('MT-LinfAttack', =8/255, t=50), ('MT-LinfAttack', =8/255, t=25), ( 'CWLinfAttack', =8/255, t=125)].

I think the calculation formula is: 50 9 + 25 9 + 125 = 800

But in the code, when the attack method idx is not zero, you use:

ori_adv_images, _ = apply_attacker(test_images, attack_name, test_labels, model, attack_eps, None, int(attack_steps), args.max_epsilon, _type=args.norm, gpu_idx=0, target=target_label)
adv_adv_images, p = apply_attacker(subpolicy_out_dict[idx-1], attack_name, test_labels, model, attack_eps, previous_p, int(attack_steps), args.max_epsilon, _type=args.norm, gpu_idx=0, target=target_label)

which means that in subsequent attacks, the same kind of attack is executed twice, so why the correct complexity is not: 50 9 + 25 9 2 + 125 2 = 1150?

vtddggg commented 3 years ago

Yes, you are right. The complexity of CAA is 800 calculated by: 50 9 + 25 9 + 125. This complexity is the sum of backward times when attack the target model. In our implementation, we do attack twice for getting the both policy output and sub-policy output. However, this process can be combined to a single run: Suppose that we have A and B attacker, we can feed x to A and (A(x), x) to B. And the complexity is A+B. Same for when we have A,B,C, we can feed x to A and (A(x), x) to B and (B(A(x)), B(x), x) to C, and the complexity is A+B+C.

You can modify the code to implement the way of attack mentioned above. It will lead more memory consumption. And the sum of backward times is still: 50 9 + 25 9 + 125 = 800.

persistz commented 3 years ago

Thank you for your reply. Do you mean complexity is the sum of backward times, not the total number of images for gradient calculation? For example, if the user feeds (A(x), x) to B, this operation actually performs 2 image gradient calculations, but under the definition of complexity, it is regarded as one time.

persistz commented 3 years ago

If the answer to the previous comment is yes, then the evaluation of other methods will seem a little strange.

Using `100-step PGD with 10 restarts' as an example, the starting point of each restart in PGD paper is the original image. Therefore, the user can perform these 10 restart operations in parallel and pass in (x1, x2,..., x10) to the attack method, so that the complexity is only 100. The comparison complexity of PGD is not the optimal one, it seems a bit unfair.

vtddggg commented 3 years ago

If the answer to the previous comment is yes, then the evaluation of other methods will seem a little strange.

Using `100-step PGD with 10 restarts' as an example, the starting point of each restart in PGD paper is the original image. Therefore, the user can perform these 10 restart operations in parallel and pass in (x1, x2,..., x10) to the attack method, so that the complexity is only 100. The comparison complexity of PGD is not the optimal one, it seems a bit unfair.

Yes, you are right. Considering `100-step PGD with 10 restarts' as complexity of 100 is fair. So we think we should modify it.

persistz commented 3 years ago

Got it. However, CAA's performance advantage over other SOTA methods is still obvious. Closed.