ryderling / DEEPSEC

DEEPSEC: A Uniform Platform for Security Analysis of Deep Learning Model
MIT License
205 stars 71 forks source link

Detection defenses set per-attack thresholds #8

Closed carlini closed 5 years ago

carlini commented 5 years ago

In Table VI the paper analyzes three different defense techniques. In this table, the paper reports the true positive rate and false positive rate of the defenses against various attacks. In doing so, the papers vary the detection threshold to make comparisons fair and says “we try our best to adjust the FPR values of all detection methods to the same level via fine-tuning the parameters.”

However, the paper varies the defense settings on a per-attack basis. This is not a valid thing to do.

When performing a security analysis between the attacker and defender it is always important to recognize that one of the players goes first and commits to an approach, and then the second player goes second and tries to defeat the other. In working with adversarial example defenses, it is the defender who commits first and the attacker who then tries to find instance that evades the defense.

As such, it is meaningless to allow the defender to alter the detection hyperparameters depending on which attack will be encountered. If the defender knew which attack was going to be presented, they could do much better than just selecting a different hyperparameter setting for the detection threshold.

Worse yet, by varying the exact threshold used, in actuality the false positive rates presented in the table vary between 1.5% and 9.0%. Comparing the true positive rate of two defenses when the corresponding false positive vary by a factor of six is meaningless. Worse yet, computing the mean TPR across a range of attacks when the FPR by a factor of six results in a completely uninterpretable value.

It would be both simpler and more accurate to use a validation set to choose the FPR once for all attacks, and then report the TPR on each attack using the same threshold.

ryderling commented 5 years ago

In Table VI the paper analyzes three different defense techniques. In this table, the paper reports the true positive rate and false positive rate of the defenses against various attacks. In doing so, the papers vary the detection threshold to make comparisons fair and says “we try our best to adjust the FPR values of all detection methods to the same level via fine-tuning the parameters.”

However, the paper varies the defense settings on a per-attack basis. This is not a valid thing to do.

When performing a security analysis between the attacker and defender it is always important to recognize that one of the players goes first and commits to an approach, and then the second player goes second and tries to defeat the other. In working with adversarial example defenses, it is the defender who commits first and the attacker who then tries to find instance that evades the defense.

As such, it is meaningless to allow the defender to alter the detection hyperparameters depending on which attack will be encountered. If the defender knew which attack was going to be presented, they could do much better than just selecting a different hyperparameter setting for the detection threshold.

Worse yet, by varying the exact threshold used, in actuality the false positive rates presented in the table vary between 1.5% and 9.0%. Comparing the true positive rate of two defenses when the corresponding false positive vary by a factor of six is meaningless. Worse yet, computing the mean TPR across a range of attacks when the FPR by a factor of six results in a completely uninterpretable value.

It would be both simpler and more accurate to use a validation set to choose the FPR once for all attacks, and then report the TPR on each attack using the same threshold.

To point out first, we did not vary the defense settings on a per-attack basis. For every one of the three detection-only defenses, we used the same setting for all attacks.

To be specific, for MagNet and FS, we tweaked the hyper-parameters of the defenses to adjust the FPR of both two defenses to 4% on validation sets (only legitimate samples) of MNIST and CIFAR10. The reason why in actuality the false positive rates presented in the table vary between 1.5% and 9.0% is that when conducting evaluation experiments, to make the dataset balanced for detection, we randomly selected the same number of natural examples from the testing set to build a mixed set for each attack. Reasons for the fluctuations in reported FPRs include the difference of natural examples in between validation set and testing set, the variance of numbers of natural examples used for evaluation from one attack to another, and the randomness of the sampling. But such fluctuations of FPR do not influence the result of TPR because TPR measures the performance of the detection only on adversarial examples while FPR measures that only on natural examples. Despite the fluctuations of reported FPRs, we used the same hyper-parameter setting for evaluations of all the attacks.

carlini commented 5 years ago

Okay, thank you for clearing this up. It may be worth explaining this in the text so no one else gets confused, but I'm closing this issue.