reds-lab / Narcissus

The official implementation of the CCS'23 paper, Narcissus clean-label backdoor attack -- only takes THREE images to poison a face recognition dataset in a clean-label way and achieves a 99.89% attack success rate.
https://arxiv.org/pdf/2204.05255.pdf
MIT License
104 stars 12 forks source link
adversarial-attacks adversarial-machine-learning ai-security backdoor-attacks deep- poisoning-attacks

Narcissus-Caravaggio

Narcissus Clean-label Backdoor Attack

PWC PWC PWC Python 3.6 Pytorch 1.10.1 CUDA 11.0

This is the official implementation of the ACM CCS'23 paper: `Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information.'

Narcissus clean-label backdoor attack provides an affirmative answer to whether backdoor attacks can present real threats: as they normally require label manipulations or strong accessibility to non-target class samples. This work demonstrates a simple yet powerful attack with access to only the target class with minimum assumptions on the attacker's knowledge and capability.

In our ACM CCS'23 paper, we show inserting maliciously-crafted Narcissus poisoned examples totaling less than 0.5% of the target-class data size or 0.05% of the training set size, we can manipulate a model trained on the poisoned dataset to classify test examples from arbitrary classes into the target class when the examples are patched with a backdoor trigger; at the same time, the trained model still maintains good accuracy on typical test examples without the trigger as if it were trained on a clean dataset.

Narcissus backdoor attack is highly effective across datasets and models, even when the trigger is injected into the physical world (see the gif demo or the full video demon). Most surprisingly, our attack can evade the latest state-of-the-art defenses in their vanilla form, or after a simple twist, we can adapt to the downstream defenses. We study the cause of the intriguing effectiveness and find that because the trigger synthesized by our attack contains features as persistent as the original semantic features of the target class, any attempt to remove such triggers would inevitably hurt the model accuracy first.

Features

Requirements

Usage & HOW-TO

Use the Narcissus.ipynb notebook for a quick start of our NARCISSUS backdoor attack. The default attack and defense state both use Resnet-18 as the model, CIFAR-10 as the target dataset, and the default attack poisoning rate is 0.5% In-class/0.05% overall.

There are a several of optional arguments in the Narcissus.ipynb:

Overall Workflow:

Narcissus

The workflow of the Narcissus attack consists of four functional parts (PubFig as an example):

Can you make it easier?

By importing the narcissus_func.py file, users can quickly deploy the Narcissus backdoor attack into their own attack environment with narcissus_gen() fucntion. There are 2 parameters in this function:

#How to launch the attack with the Push of ONE Button?
narcissus_trigger = narcissus_gen(dataset_path = './dataset', lab = 2)

This function will return a [1,3,32,32] NumPy array, which contains the Narcissus backdoor trigger generated based on only the target class (e.g., '2'). DO NOT forget to use the trigger to poison some target class samples and launch the attack;)

Special thanks to...

Stargazers repo roster for @ruoxi-jia-group/Narcissus

Forkers repo roster for @ruoxi-jia-group/Narcissus