Open guanhdrmq opened 1 day ago
Hi inikisheve,
I use spsa random noise (RDSA) to add little noise into loss function to perturb image as adversarial image to vision-language models. Here I use instrutblip model. However, it does not like to convergence. how could I do? here is my code. Very appreciate for your help. Thank you so much.
def compute_spsa_gradient(self, img, batch_targets, batch_size): print("===Calculating SPSA gradient===") delta = normalize_delta(torch.randn_like(img) * 1e-4) print("delta==============", delta) adv_noise_plus = img + delta adv_noise_minus = img - delta adv_noise_plus = normalize(adv_noise_plus).repeat(batch_size, 1, 1, 1) adv_noise_minus = normalize(adv_noise_minus).repeat(batch_size, 1, 1, 1) with torch.no_grad(): samples_plus = { 'image': adv_noise_plus, 'text_input': [''] * batch_size, 'text_output': batch_targets } loss_plus = self.model(samples_plus)['loss'] print("loss_plus", loss_plus) samples_minus = { 'image': adv_noise_minus, 'text_input': [''] * batch_size, 'text_output': batch_targets } loss_minus = self.model(samples_minus)['loss'] print("loss_minus", loss_minus) dloss = loss_plus - loss_minus print("dloss==============", dloss) max_diff = 1e-2 if max_diff is not None and abs(dloss) > max_diff: dloss = max_diff * (1 if dloss > 0 else -1) print("use dloss============", dloss) grad_estimate = dloss / (2 * delta) # print("grad_estimate", grad_estimate) return grad_estimate
hi guanhdrmq,
The RDSA formula is a bit different from SPSA and should be grad_estimate = (dloss / (2 * 1e-4**2)) * petrubation
Here I divided by 1e-4 squared to account for that in your code perturbation is already multiplied by 1e-4.
Thank you so much, here is my code by SPSA norm distribution perturbation: def compute_spsa_gradient(self, img, batch_targets, batch_size, num_samples=10, delta=1e-4): """ Compute SPSA gradient using multiple samples for variance reduction. """
print("=== Calculating SPSA Gradient ===")
grad_estimate = torch.zeros_like(img)
for _ in range(num_samples):
perturbation = normalize_delta(torch.randn_like(img) * delta) # Random direction
adv_noise_plus = normalize(img + perturbation).repeat(batch_size, 1, 1, 1)
adv_noise_minus = normalize(img - perturbation).repeat(batch_size, 1, 1, 1)
# Compute loss for both perturbed directions
with torch.no_grad():
loss_plus = self.model({'image': adv_noise_plus,
'text_input': [''] * batch_size,
'text_output': batch_targets})['loss']
loss_minus = self.model({'image': adv_noise_minus,
'text_input': [''] * batch_size,
'text_output': batch_targets})['loss']
# Limit loss difference to reduce variance
dloss = loss_plus - loss_minus
max_diff = 1e-2
# dloss = torch.clamp(dloss, -max_diff, max_diff)
if max_diff is not None and abs(dloss) > max_diff: dloss = max_diff * (1 if dloss > 0 else -1)
# Accumulate the gradient estimate
grad_estimate += dloss / (2 * perturbation)
# grad_estimate += (dloss / (2 * 1e-4 ** 2)) * perturbation
# Average gradient to reduce noise
grad_estimate /= num_samples
return grad_estimate
And SPSA rademacher as perturbation:
def compute_spsa_gradient(self, img, batch_targets, batch_size):
print("============Calculating SPSA gradient...")
scaler = 0.01
delta = (torch.bernoulli(torch.full(img.shape, 0.5, device=img.device)) * 2 - 1)
# 创建正向和负向扰动的图像
adv_noise_plus = img + scaler * delta
adv_noise_minus = img - scaler * delta
adv_noise_plus = normalize(adv_noise_plus).repeat(batch_size, 1, 1, 1)
adv_noise_minus = normalize(adv_noise_minus).repeat(batch_size, 1, 1, 1)
# 计算正向和负向扰动的损失
with torch.no_grad():
samples_plus = {
'image': adv_noise_plus,
'text_input': [''] * batch_size,
'text_output': batch_targets
}
loss_plus = self.model(samples_plus)['loss']
samples_minus = {
'image': adv_noise_minus,
'text_input': [''] * batch_size,
'text_output': batch_targets
}
loss_minus = self.model(samples_minus)['loss']
dloss = loss_plus - loss_minus
max_diff = 1e-2
if abs(dloss) > max_diff:
dloss = max_diff * (1 if dloss > 0 else -1)
grad_estimate = dloss / (2 * scaler * delta)
grad_estimate = grad_estimate.mean(dim=0, keepdim=True)
return grad_estimate
But both of them cannot not be converged. Could you help me a little bit? Very appreciate. Thank you so much.
Thank you so much, here is my code by SPSA norm distribution perturbation: def compute_spsa_gradient(self, img, batch_targets, batch_size, num_samples=10, delta=1e-4): """ Compute SPSA gradient using multiple samples for variance reduction. """
print("=== Calculating SPSA Gradient ===") grad_estimate = torch.zeros_like(img) for _ in range(num_samples): perturbation = normalize_delta(torch.randn_like(img) * delta) # Random direction adv_noise_plus = normalize(img + perturbation).repeat(batch_size, 1, 1, 1) adv_noise_minus = normalize(img - perturbation).repeat(batch_size, 1, 1, 1) # Compute loss for both perturbed directions with torch.no_grad(): loss_plus = self.model({'image': adv_noise_plus, 'text_input': [''] * batch_size, 'text_output': batch_targets})['loss'] loss_minus = self.model({'image': adv_noise_minus, 'text_input': [''] * batch_size, 'text_output': batch_targets})['loss'] # Limit loss difference to reduce variance dloss = loss_plus - loss_minus max_diff = 1e-2 # dloss = torch.clamp(dloss, -max_diff, max_diff) if max_diff is not None and abs(dloss) > max_diff: dloss = max_diff * (1 if dloss > 0 else -1) # Accumulate the gradient estimate grad_estimate += dloss / (2 * perturbation) # grad_estimate += (dloss / (2 * 1e-4 ** 2)) * perturbation # Average gradient to reduce noise grad_estimate /= num_samples return grad_estimate
And SPSA rademacher as perturbation:
def compute_spsa_gradient(self, img, batch_targets, batch_size): print("============Calculating SPSA gradient...") scaler = 0.01 delta = (torch.bernoulli(torch.full(img.shape, 0.5, device=img.device)) * 2 - 1) # 创建正向和负向扰动的图像 adv_noise_plus = img + scaler * delta adv_noise_minus = img - scaler * delta adv_noise_plus = normalize(adv_noise_plus).repeat(batch_size, 1, 1, 1) adv_noise_minus = normalize(adv_noise_minus).repeat(batch_size, 1, 1, 1) # 计算正向和负向扰动的损失 with torch.no_grad(): samples_plus = { 'image': adv_noise_plus, 'text_input': [''] * batch_size, 'text_output': batch_targets } loss_plus = self.model(samples_plus)['loss'] samples_minus = { 'image': adv_noise_minus, 'text_input': [''] * batch_size, 'text_output': batch_targets } loss_minus = self.model(samples_minus)['loss'] dloss = loss_plus - loss_minus max_diff = 1e-2 if abs(dloss) > max_diff: dloss = max_diff * (1 if dloss > 0 else -1) grad_estimate = dloss / (2 * scaler * delta) grad_estimate = grad_estimate.mean(dim=0, keepdim=True) return grad_estimate
But both of them cannot not be converged. Could you help me a little bit? Very appreciate. Thank you so much.
Maybe it's because you are normalizing image + perturbation, you can try normalizing just the image and then adding perturbation to it. And it is quite sensitive to the choice of epsilon so if you haven't tried you can try increasing 1e-4. Also the max_diff is meant to prevent very large updates, but I chose it somewhat arbitrarily based on what worked better for neural network training. It is not a necessary part of the SPSA algorithm and perhaps you don't need it at all in for adversarial models.
It might even work well with 1e-1 instead of 1e-4 based on what I've seen people use
Many thanks, I have tried 1e-2 to 1e-7 but all failed convergence. Are you interested in my project? I can share my code for you. While I will try 1e-1 but it maybe too large perturbation. I suspect the norm as well. Do it now. Thank you very much.
Many thanks, I have tried 1e-2 to 1e-7 but all failed convergence. Are you interested in my project? I can share my code for you. While I will try 1e-1 but it maybe too large perturbation. I suspect the norm as well. Do it now. Thank you very much.
please share the code, I will take a look
Hi inikishev, You may need Nvidia A100 80G to run this code for gradient based attack. Bur for SPSA, it need less GPU memory.
I have forked the project into my github: https://github.com/guanhdrmq/Visual-Adversarial-Examples-Jailbreak-Large-Language-Models/blob/main/README.md
You can find spsa suffix files in the main directory and spsa suffix files in minigpt_utils. Then you can use these commands for random and rademacher perturbation : python minigpt_visual_attack_spsa_e.py --cfg_path eval_configs/minigpt4_eval.yaml --gpu_id 0 --n_iters 5000 --alpha 1 --save_dir visual_unconstrained_e4_xxx_5000 python minigpt_visual_attack_spsa_b.py --cfg_path eval_configs/minigpt4_eval.yaml --gpu_id 0 --n_iters 5000 --alpha 1 --save_dir visual_unconstrained_ber_xxx_5000
Thank you so much
Hi inikisheve,
I use spsa random noise (RDSA) to add little noise into loss function to perturb image as adversarial image to vision-language models. Here I use instrutblip model. However, it does not like to convergence. how could I do? here is my code. Very appreciate for your help. Thank you so much.