softsys4ai / athena

Athena: A Framework for Defending Machine Learning Systems Against Adversarial Attacks
https://softsys4ai.github.io/athena/
MIT License
42 stars 9 forks source link
adversarial-machine-learning artificial-intelligence deep-neural-networks defense-methods machine-learning security trusted-ai

Adversarial Defense as a Framework

Machine learning systems have achieved impressive success in a wide range of domains like computer vision and natural langurage processing. However, their vulnerability to adversarial examples can lead to a series of consequences, especially in security-critical tasks. For example, an object detector on a self-driving vehicle may incorrectly recognize an stop sign as a speed limit.

The threat of the adversarial examples has inspired a sizable body of research on various defense techniques. With the assumption on the specific known attack(s), most of the existing defenses, although effective against particular attacks, can be circumvented under slightly different conditions, either a stronger adaptive adversary or in some cases even weak (but different) adversaries. In order to stop the arms race between the attacks and defenses, we wonder

How can we, instead, design a defense, not as a technique, but as a framework that one can construct a specific defense considering the niche tradeoff space of robustness one may want to achieve as well as the cost one is willing to pay to achieve that level of robustness?

ATHENA: A Framework based on Diverse Weak Defenses for Building Adversarial Defense \ Ying Meng, Jianhai Su, Jason M O'Kane, Pooyan Jamshidi


arXiv Preprint

Website

CSCE 585
Project

Hello World
Tutorial

ATHENA: a Framework for Building Adversarial Defense

ATHENA (Goddess of defense in Greek mythology) is an extensible framework for building generic (and thus, broadly applicable) yet effective defense against adversarial attacks.

The design philosophy behind ATHENA is based on ensemble of many diverse weak defenses (WDs), where each WD, the building blocks of the framework, is a machine learning classifier (e.g., DNN, SVM) that first applies a transformation on the original input and then produces an output for the transformed input. Given an input, an ensemble first collects predicted outputs from all of the WDs and then determines the final output, using some ensemble strategy such as majority voting or averaging the predicted outputs from the WDs.


Insights: Weak Defenses Complements Each Other!

In computer vision, a transformation is an image processing function. By distorbing its input, a transformation changes the adversarial optimized perturbations and thus making the perturbations less effective. However, the effectiveness of a single type of transformation varies on attacks and datasets. By mitigating the perturbations in different ways such as adjusting angles or position of the input, adding or removing noises, a collection of diverse transformations provides robustness against various attacks. Thus, the Diverse ensemble achieves the lowest error rate in most cases, especially for tasks on CIFAR-100.

Ensembling diverse transformations can result in a robust defense against a variety of attacks and provide a tradeoff space, where one can build a more robust ensemble by adding more transformations or building an ensemble with lower overhead and cost by utilizing fewer transformations.


Zero Knowledge Threat Model

Adversary knows everything about the model, but it does not know there is a defense in place!

The effectiveness of individual WDs (each associated to a transformation) varies across attack methods and magnitudes of an attack. While a large population of transformations from a variety of categories successfully disentangle adversarial perturbations generated by various attacks. The variation of individual WDs' error rates spans wider as the perturbation magnitude become stronger for a selected attack. By utilizing many diverse transformations, with ATHENA, we build effective ensembles that outperform the two state-of-the-art defenses --- PGD adversarial training (PGD-ADT) and randomly smoothing (RS), in all cases.

Black-box Threat Model

Adversary does not have access to the model but it can query the model.

Transfer-based approach

Although the transferability rate increases as the budget increases, the drop in the transferability rate from the undefended model (UM) to ATHENA indicates that ATHENA is less sensitive to the perturbation. Ensembling from many diverse transformations provides tangible benefits in blocking the adversarial transferability between weak defenses, and thus enhances model's robustness against the transfer-based black-box attack.

Gradient-direction-estimation-based approach

Hop-Skip-Jump attack (HSJA) generates adversarial examples by querying the output labels from the target model for the perturbed images. Compared to that generated based on the UM, the adversarial examples generated based on ATHENA are much further away from the corresponding benign samples. As the query budget increases, the distances of the UM-targeted AEs drop much more significantly than that of the ATHENA-targeted AEs. Therefore, ATHENA increases the chance of such AEs being detected by even a simple detection mechanism.

White-box Threat Model

Adversary knows everything about the model and defense in place!

Greedy approach

As expected, stronger AEs are generated by the greedy white-box attack with a looser constraint on the dissimilarity threshold. However, such success comes at a price: with the largest threshold, the greedy attack has to spend 310X more time to generate adversarial example for a single input. This provides a tradeoff space, where realizations of ATHENA that employ larger ensembles incur more cost to the adversaries and they will eventually give up! Moreover, the generated AEs are heavily distored and very likely to be detected either by a human or an automated detection mechanism.

Optimization-based approach

As the adversary have access to more WDs, it can launch more successful attacks without even increasing the perturbations. However, the computational cost of AE generation increases as well. The attacker has the choice to sample more random transformations and a choice to a distribution of a large population and diverse transformations in order to generate stronger AEs. However, this will incur a larger computational cost as well.


Acknowledgement


How to Cite

Citation

Ying Meng, Jianhai Su, Jason M O'Kane, and Pooyan Jamshidi. ATHENA: A Framework based on Diverse Weak Defenses for Building Adversarial Defense. arXiv preprint arXiv: 2001.00308, 2020.

Bibtex

@article{meng2020athena,
      title={ATHENA: A Framework based on Diverse Weak Defenses for Building Adversarial Defense},
      author={Ying Meng and Jianhai Su and Jason M O'Kane and Pooyan Jamshidi},
      journal={arXiv preprint arXiv:2001.00308},
      year={2020}
}