Vanilla Diffusion Classifier is not robust to linf adversarial examples as reported in the paper

Crazygay12138 commented 5 months ago

Hi! I tested the Vanilla Diffusion Classifier (no likelihood maximization) with apgd + dlr which is implemented by AutoAttack. The clean accuracy is 94.53%, but the Linf robust accuracy drops to around 5% under $\epsilon = 8/255$, which is inconsistent with the result reported in the paper. I set $T = 1000$ and share_noise = False.

huanranchen commented 5 months ago

Hi! Are you running this code? https://github.com/huanranchen/DiffusionClassifier/blob/master/DCTK.py

The followings are the logs when attacking diffusion classifiers on the first 100 examples of CIFAR-10 test set: initial accuracy: 96.00% apgd-dlr - 1/48 - 0 out of 1 successfully perturbed apgd-dlr - 2/48 - 1 out of 1 successfully perturbed apgd-dlr - 3/48 - 1 out of 1 successfully perturbed apgd-dlr - 4/48 - 1 out of 1 successfully perturbed apgd-dlr - 5/48 - 1 out of 1 successfully perturbed apgd-dlr - 6/48 - 1 out of 1 successfully perturbed apgd-dlr - 7/48 - 1 out of 1 successfully perturbed apgd-dlr - 8/48 - 1 out of 1 successfully perturbed apgd-dlr - 9/48 - 0 out of 1 successfully perturbed apgd-dlr - 10/48 - 1 out of 1 successfully perturbed apgd-dlr - 11/48 - 1 out of 1 successfully perturbed apgd-dlr - 12/48 - 0 out of 1 successfully perturbed apgd-dlr - 13/48 - 0 out of 1 successfully perturbed apgd-dlr - 14/48 - 0 out of 1 successfully perturbed apgd-dlr - 15/48 - 0 out of 1 successfully perturbed apgd-dlr - 16/48 - 1 out of 1 successfully perturbed apgd-dlr - 17/48 - 0 out of 1 successfully perturbed apgd-dlr - 18/48 - 1 out of 1 successfully perturbed apgd-dlr - 19/48 - 0 out of 1 successfully perturbed apgd-dlr - 20/48 - 1 out of 1 successfully perturbed apgd-dlr - 21/48 - 1 out of 1 successfully perturbed apgd-dlr - 22/48 - 1 out of 1 successfully perturbed apgd-dlr - 23/48 - 1 out of 1 successfully perturbed apgd-dlr - 24/48 - 0 out of 1 successfully perturbed apgd-dlr - 25/48 - 1 out of 1 successfully perturbed apgd-dlr - 26/48 - 1 out of 1 successfully perturbed apgd-dlr - 27/48 - 1 out of 1 successfully perturbed apgd-dlr - 28/48 - 1 out of 1 successfully perturbed apgd-dlr - 29/48 - 0 out of 1 successfully perturbed apgd-dlr - 30/48 - 0 out of 1 successfully perturbed apgd-dlr - 31/48 - 0 out of 1 successfully perturbed apgd-dlr - 32/48 - 1 out of 1 successfully perturbed apgd-dlr - 33/48 - 1 out of 1 successfully perturbed apgd-dlr - 34/48 - 1 out of 1 successfully perturbed apgd-dlr - 35/48 - 1 out of 1 successfully perturbed apgd-dlr - 36/48 - 1 out of 1 successfully perturbed apgd-dlr - 37/48 - 0 out of 1 successfully perturbed apgd-dlr - 38/48 - 0 out of 1 successfully perturbed apgd-dlr - 39/48 - 0 out of 1 successfully perturbed apgd-dlr - 40/48 - 0 out of 1 successfully perturbed apgd-dlr - 41/48 - 0 out of 1 successfully perturbed apgd-dlr - 42/48 - 1 out of 1 successfully perturbed apgd-dlr - 43/48 - 1 out of 1 successfully perturbed apgd-dlr - 44/48 - 0 out of 1 successfully perturbed apgd-dlr - 45/48 - 1 out of 1 successfully perturbed apgd-dlr - 46/48 - 1 out of 1 successfully perturbed apgd-dlr - 47/48 - 1 out of 1 successfully perturbed apgd-dlr - 48/48 - 1 out of 1 successfully perturbed robust accuracy after APGD-DLR: 36.00% (total time 337293.3 s) max Linf perturbation: 0.03137, nan in tensor: 0, max: 1.00000, min: 0.00000 robust accuracy: 36.00%

By the way, diffusion classifiers possess certified robustness, guaranteed to achieve at least certain robustness against any attacks. Welcome to read my other paper, "Your Diffusion Model is Secretly a Certifiably Robust Classifier"!

Crazygay12138 commented 5 months ago

I run https://github.com/huanranchen/DiffusionClassifier/blob/master/experiments/DiffusionAsClassifierTK.py. I just change test_apgd_dlr_acc(dc, loader=test_loader, norm="L2", eps=0.5) to test_apgd_dlr_acc(dc, loader=test_loader, norm="Linf", eps=8/255). I test 512 samples.

huanranchen commented 5 months ago

est_loader, norm="L2", eps=0.5) to test_apgd

@Crazygay12138

I sincerely apologize for the confusion caused by the many revisions this paper has undergone. The inconsistency you've noted mainly stems from the following:

In May last year, we tested the robustness of the DiffusionClassifier using NCSNpp's diffusion checkpoints. At that time, under these checkpoints, the CIFAR-10 dataset with 512 samples exhibited 35.94% robustness against the Linf 8/255 threat model. You can verify this using the Robust Diffusion Classifier available here: DiffusionClassifier.py. The log from this experiment is attached at the end of this reply.

After completing and preprinting the paper, we switched from the NCSNpp diffusion checkpoints to the simpler EDM model. We revised the diffusion classifier according to EDM's parameterization method and updated the experiment for the RDC part only. Although I did not redo the experiments for the diffusion classifier, I irresponsibly changed the description of the diffusion checkpoints in the paper directly from "NCSNpp" to "EDM" without clarifying that the experiments for the Diffusion Classifier section were still conducted under NCSNpp. This change likely accounts for the discrepancies between your experimental results and those reported in our paper.

I deeply regret any confusion or inconvenience my oversight has caused and offer my sincerest apologies.

diffusionasclassifier50-100.txt diffusionclassifier-100-256.txt diffusionclassifier-256-356.txt diffusionclassifier-356-512.txt

diffusionasclassifier0-50.txt

huanranchen commented 5 months ago

(I would like to make a shameless declaration once again to make up for my previous mistakes.) All the experiments for RDC and Likelihood Maximization (N) were conducted using the EDM checkpoint, and this part is not an issue.

All the experiments in the paper "Your Diffusion Model is Secretly a Certifiably Robust Classifier" were also performed using the EDM checkpoint. The paper achieved state-of-the-art certified robustness, demonstrating that the robustness of the diffusion classifier is genuine and not overestimated due to insufficient attacks.

Crazygay12138 commented 5 months ago

Thanks for your detailed reply. However, it is strange that different DMs will make such a notable difference, making me confused about whether DC itself (without LM and random smoothing) is truly robust. I also suspect that it is LM that bring robustness since LM + WideResNet70 also shows SOTA robustness. As for certified robustness (I'm quite new to this field), considering the existence of random smoothing technique, I am also confused how important is the DC. Can we say that DC has better certified robustness because it can be transform into a powerful noise-sample classifier?

huanranchen commented 5 months ago

Thanks for your detailed reply. However, it is strange that different DMs will make such a notable difference, making me confused about whether DC itself (without LM and random smoothing) is truly robust. I also suspect that it is LM that bring robustness since LM + WideResNet70 also shows SOTA robustness. As for certified robustness (I'm quite new to this field), considering the existence of random smoothing technique, I am also confused how important is the DC. Can we say that DC has better certified robustness because it can be transform into a powerful noise-sample classifier?

I'm also quite curious about whether different diffusion models significantly impact the performance of a diffusion classifier. Recently, I conducted measurements for the robustness of EDM-DC, which had not been previously evaluated. The results indicate a robustness of 38.28% on the first 128 samples of CIFAR-10, outperforming the NCSNpp model. It appears there is still some misalignment between our experimental results.

The code used for the measurement is as follows:

import torch
from models.unets import get_edm_cifar_cond
from data import get_CIFAR10_test
from tester import test_acc, test_apgd_dlr_acc
import argparse
from defenses.PurificationDefenses.DiffPure import EDMEulerIntegralDC, EDMEulerIntegralWraped

parser = argparse.ArgumentParser()
parser.add_argument("--begin", type=int, default=0)
parser.add_argument("--end", type=int, default=128)
args = parser.parse_args()
begin, end = args.begin, args.end

model = get_edm_cifar_cond(use_fp16=True).cuda()
test_loader = get_CIFAR10_test(batch_size=1)
test_loader = [item for i, item in enumerate(test_loader) if begin <= i < end]

dc = EDMEulerIntegralWraped(unet=model, timesteps=torch.linspace(1e-4, 3, 1001))

test_apgd_dlr_acc(dc, loader=test_loader, norm="Linf", eps=8/255)

Additionally, here is the log of this experiment:

edm-dc.txt

LYMDLUT commented 4 months ago

Could you provide l2 attack config and result log？

LYMDLUT commented 4 months ago

have you tried attacks with eot? Like test_apgd_dlr_acc(dc, loader=test_loader, norm="Linf", eps=8/255, eot_iter=20)?

LYMDLUT commented 4 months ago

Thanks a lot!

huanranchen commented 4 months ago

Could you provide l2 attack config and result log？

Hi, for L2 attacks, the code is:

import torch
from models.unets import get_edm_cifar_cond
from data import get_CIFAR10_test
from tester import test_acc, test_apgd_dlr_acc
import argparse
from defenses.PurificationDefenses.DiffPure import EDMEulerIntegralDC, EDMEulerIntegralWraped

parser = argparse.ArgumentParser()
parser.add_argument("--begin", type=int, default=0)
parser.add_argument("--end", type=int, default=128)
args = parser.parse_args()
begin, end = args.begin, args.end

model = get_edm_cifar_cond(use_fp16=True).cuda()
test_loader = get_CIFAR10_test(batch_size=1)
test_loader = [item for i, item in enumerate(test_loader) if begin <= i < end]

dc = EDMEulerIntegralWraped(unet=model, timesteps=torch.linspace(1e-4, 3, 1001))

test_apgd_dlr_acc(dc, loader=test_loader, norm="L2", eps=0.5)

like I said before (https://github.com/huanranchen/DiffusionClassifier/issues/3#issuecomment-2078648869), I didn't run this code. I haven't performed L2 attacks on the EDM checkpoint. I only perform L2 attacks on NCSNpp checkpoints. The logs are here

L2-128-192.txt L2-192-256.txt L2-0-64.txt L2-64-128.txt

I'm quite confident that the provided code will achieve better robustness than my NCSNpp experiments one year ago since the EDM ckpt is better than NCSNpp ckpt. Also, the certified robustness (see Your Diffusion Model is Secretly a Certifiably Robust Classifier) is done in L2 using EDM ckpt.

huanranchen commented 4 months ago

have you tried attacks with eot? Like test_apgd_dlr_acc(dc, loader=test_loader, norm="Linf", eps=8/255, eot_iter=20)?

Yeah, but EOT does not make any difference on robustness.

EOT aims to stabilize the gradient directions. However, the gradient of my diffusion classifier is already extremely stable (see the cosine similarity part in Fig. 2.3 of RDC). The cosine similarity is more than 0.9. In this case, whether we apply EOT does not make any difference.

Theoretically, the diffusion classifier is Lipschitz and beta-smooth (see Your Diffusion Model is Secretly a Certifiably Robust Classifier), thus its gradient is also a smooth function.

huanranchen commented 1 month ago

This problem is solved.

By adjusting diffusion timesteps (larger timesteps will induce higher robustness), we can achieve 30%+ empirical robustness:

import torch
from models.unets import get_edm_cifar_cond
from data import get_CIFAR10_test
from tester import test_acc, test_apgd_dlr_acc
import argparse
from defenses.PurificationDefenses.DiffPure import EDMEulerIntegralDC, EDMEulerIntegralWraped
from data.utils import save_dataset

parser = argparse.ArgumentParser()
parser.add_argument("--begin", type=int, default=0)
parser.add_argument("--end", type=int, default=128)
args = parser.parse_args()
begin, end = args.begin, args.end

model = get_edm_cifar_cond(use_fp16=False).cuda()
test_loader = get_CIFAR10_test(batch_size=1)
test_loader = [item for i, item in enumerate(test_loader) if begin <= i < end]

dc = EDMEulerIntegralWraped(unet=model, timesteps=torch.linspace(0.5, 3, 1001))
test_acc(dc, test_loader, verbose=True)
adv, y = test_apgd_dlr_acc(dc, loader=test_loader, norm="Linf", eps=8 / 255)
torch.save(adv, "adv_dc_128.npy")
save_dataset(adv, y, "./dc_advs_0.5/", "gt.npy")

The log is:

using custom version including apgd-dlr Warning: it seems to be a randomized defense! Please use version="rand". See flags_doc.md for details. initial accuracy: 92.97% /home/chenhuanran/miniconda3/lib/python3.12/site-packages/torch/autograd/graph.py:744: UserWarning: Plan failed with a cu dnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUP PORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.) return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass apgd-dlr - 1/119 - 0 out of 1 successfully perturbed apgd-dlr - 2/119 - 1 out of 1 successfully perturbed apgd-dlr - 3/119 - 1 out of 1 successfully perturbed apgd-dlr - 4/119 - 0 out of 1 successfully perturbed apgd-dlr - 5/119 - 1 out of 1 successfully perturbed apgd-dlr - 6/119 - 1 out of 1 successfully perturbed apgd-dlr - 7/119 - 1 out of 1 successfully perturbed apgd-dlr - 8/119 - 1 out of 1 successfully perturbed apgd-dlr - 9/119 - 0 out of 1 successfully perturbed apgd-dlr - 10/119 - 1 out of 1 successfully perturbed apgd-dlr - 11/119 - 1 out of 1 successfully perturbed apgd-dlr - 12/119 - 0 out of 1 successfully perturbed apgd-dlr - 13/119 - 0 out of 1 successfully perturbed apgd-dlr - 14/119 - 1 out of 1 successfully perturbed apgd-dlr - 15/119 - 0 out of 1 successfully perturbed apgd-dlr - 16/119 - 1 out of 1 successfully perturbed apgd-dlr - 17/119 - 0 out of 1 successfully perturbed apgd-dlr - 18/119 - 0 out of 1 successfully perturbed apgd-dlr - 19/119 - 1 out of 1 successfully perturbed apgd-dlr - 20/119 - 1 out of 1 successfully perturbed apgd-dlr - 21/119 - 1 out of 1 successfully perturbed apgd-dlr - 22/119 - 0 out of 1 successfully perturbed apgd-dlr - 23/119 - 1 out of 1 successfully perturbed apgd-dlr - 24/119 - 0 out of 1 successfully perturbed apgd-dlr - 25/119 - 1 out of 1 successfully perturbed apgd-dlr - 26/119 - 1 out of 1 successfully perturbed apgd-dlr - 27/119 - 0 out of 1 successfully perturbed apgd-dlr - 28/119 - 1 out of 1 successfully perturbed apgd-dlr - 29/119 - 0 out of 1 successfully perturbed apgd-dlr - 30/119 - 0 out of 1 successfully perturbed apgd-dlr - 31/119 - 0 out of 1 successfully perturbed apgd-dlr - 32/119 - 1 out of 1 successfully perturbed apgd-dlr - 33/119 - 1 out of 1 successfully perturbed apgd-dlr - 34/119 - 0 out of 1 successfully perturbed apgd-dlr - 35/119 - 1 out of 1 successfully perturbed apgd-dlr - 36/119 - 1 out of 1 successfully perturbed apgd-dlr - 37/119 - 0 out of 1 successfully perturbed apgd-dlr - 38/119 - 1 out of 1 successfully perturbed apgd-dlr - 39/119 - 0 out of 1 successfully perturbed apgd-dlr - 40/119 - 1 out of 1 successfully perturbed apgd-dlr - 41/119 - 1 out of 1 successfully perturbed apgd-dlr - 42/119 - 1 out of 1 successfully perturbed apgd-dlr - 43/119 - 1 out of 1 successfully perturbed apgd-dlr - 44/119 - 0 out of 1 successfully perturbed apgd-dlr - 45/119 - 1 out of 1 successfully perturbed apgd-dlr - 46/119 - 1 out of 1 successfully perturbed apgd-dlr - 47/119 - 1 out of 1 successfully perturbed apgd-dlr - 48/119 - 1 out of 1 successfully perturbed apgd-dlr - 49/119 - 1 out of 1 successfully perturbed apgd-dlr - 50/119 - 0 out of 1 successfully perturbed apgd-dlr - 51/119 - 1 out of 1 successfully perturbed apgd-dlr - 52/119 - 1 out of 1 successfully perturbed apgd-dlr - 53/119 - 0 out of 1 successfully perturbed apgd-dlr - 54/119 - 1 out of 1 successfully perturbed apgd-dlr - 55/119 - 0 out of 1 successfully perturbed apgd-dlr - 56/119 - 1 out of 1 successfully perturbed apgd-dlr - 57/119 - 0 out of 1 successfully perturbed apgd-dlr - 58/119 - 0 out of 1 successfully perturbed apgd-dlr - 59/119 - 1 out of 1 successfully perturbed apgd-dlr - 60/119 - 1 out of 1 successfully perturbed apgd-dlr - 61/119 - 1 out of 1 successfully perturbed apgd-dlr - 62/119 - 1 out of 1 successfully perturbed apgd-dlr - 63/119 - 0 out of 1 successfully perturbed apgd-dlr - 64/119 - 0 out of 1 successfully perturbed apgd-dlr - 65/119 - 1 out of 1 successfully perturbed apgd-dlr - 66/119 - 0 out of 1 successfully perturbed apgd-dlr - 67/119 - 0 out of 1 successfully perturbed apgd-dlr - 68/119 - 1 out of 1 successfully perturbed apgd-dlr - 69/119 - 0 out of 1 successfully perturbed apgd-dlr - 70/119 - 1 out of 1 successfully perturbed apgd-dlr - 71/119 - 1 out of 1 successfully perturbed apgd-dlr - 72/119 - 1 out of 1 successfully perturbed apgd-dlr - 73/119 - 1 out of 1 successfully perturbed apgd-dlr - 74/119 - 1 out of 1 successfully perturbed apgd-dlr - 75/119 - 0 out of 1 successfully perturbed apgd-dlr - 76/119 - 0 out of 1 successfully perturbed apgd-dlr - 77/119 - 1 out of 1 successfully perturbed apgd-dlr - 78/119 - 1 out of 1 successfully perturbed apgd-dlr - 79/119 - 1 out of 1 successfully perturbed apgd-dlr - 80/119 - 1 out of 1 successfully perturbed apgd-dlr - 81/119 - 0 out of 1 successfully perturbed apgd-dlr - 82/119 - 0 out of 1 successfully perturbed apgd-dlr - 83/119 - 1 out of 1 successfully perturbed apgd-dlr - 84/119 - 1 out of 1 successfully perturbed apgd-dlr - 85/119 - 1 out of 1 successfully perturbed apgd-dlr - 86/119 - 0 out of 1 successfully perturbed apgd-dlr - 87/119 - 1 out of 1 successfully perturbed apgd-dlr - 88/119 - 1 out of 1 successfully perturbed apgd-dlr - 89/119 - 1 out of 1 successfully perturbed apgd-dlr - 90/119 - 1 out of 1 successfully perturbed apgd-dlr - 91/119 - 0 out of 1 successfully perturbed apgd-dlr - 92/119 - 1 out of 1 successfully perturbed apgd-dlr - 93/119 - 1 out of 1 successfully perturbed apgd-dlr - 94/119 - 1 out of 1 successfully perturbed apgd-dlr - 95/119 - 0 out of 1 successfully perturbed apgd-dlr - 96/119 - 1 out of 1 successfully perturbed apgd-dlr - 97/119 - 0 out of 1 successfully perturbed apgd-dlr - 98/119 - 0 out of 1 successfully perturbed apgd-dlr - 99/119 - 0 out of 1 successfully perturbed apgd-dlr - 100/119 - 0 out of 1 successfully perturbed apgd-dlr - 101/119 - 1 out of 1 successfully perturbed apgd-dlr - 102/119 - 1 out of 1 successfully perturbed apgd-dlr - 103/119 - 1 out of 1 successfully perturbed apgd-dlr - 104/119 - 0 out of 1 successfully perturbed apgd-dlr - 105/119 - 1 out of 1 successfully perturbed apgd-dlr - 106/119 - 1 out of 1 successfully perturbed apgd-dlr - 107/119 - 1 out of 1 successfully perturbed apgd-dlr - 108/119 - 1 out of 1 successfully perturbed apgd-dlr - 109/119 - 1 out of 1 successfully perturbed apgd-dlr - 110/119 - 0 out of 1 successfully perturbed apgd-dlr - 111/119 - 1 out of 1 successfully perturbed apgd-dlr - 112/119 - 1 out of 1 successfully perturbed apgd-dlr - 113/119 - 0 out of 1 successfully perturbed apgd-dlr - 114/119 - 0 out of 1 successfully perturbed apgd-dlr - 115/119 - 1 out of 1 successfully perturbed apgd-dlr - 116/119 - 1 out of 1 successfully perturbed apgd-dlr - 117/119 - 1 out of 1 successfully perturbed apgd-dlr - 118/119 - 1 out of 1 successfully perturbed apgd-dlr - 119/119 - 0 out of 1 successfully perturbed robust accuracy after APGD-DLR: 34.38% (total time 817271.0 s) max Linf perturbation: 0.03137, nan in tensor: 0, max: 1.00000, min: 0.00000 robust accuracy: 34.38% Successfully create a new dataset with 128 images

huanranchen / DiffusionClassifier

Vanilla Diffusion Classifier is not robust to linf adversarial examples as reported in the paper #3