inikishev / torchzero

0th order optimizers, gradient chaining, random gradient approximation
MIT License
0 stars 0 forks source link

SPSA convergence #3

Open guanhdrmq opened 1 day ago

guanhdrmq commented 1 day ago

Hi inikisheve,

I use spsa random noise (RDSA) to add little noise into loss function to perturb image as adversarial image to vision-language models. Here I use instrutblip model. However, it does not like to convergence. how could I do? here is my code. Very appreciate for your help. Thank you so much.

def compute_spsa_gradient(self, img, batch_targets, batch_size):
    print("===Calculating SPSA gradient===")

    delta = normalize_delta(torch.randn_like(img) * 1e-4)
    print("delta==============", delta)

    adv_noise_plus = img + delta
    adv_noise_minus = img - delta

    adv_noise_plus = normalize(adv_noise_plus).repeat(batch_size, 1, 1, 1)
    adv_noise_minus = normalize(adv_noise_minus).repeat(batch_size, 1, 1, 1)

    with torch.no_grad():
        samples_plus = {
            'image': adv_noise_plus,
            'text_input': [''] * batch_size,
            'text_output': batch_targets
        }
        loss_plus = self.model(samples_plus)['loss']
        print("loss_plus", loss_plus)

        samples_minus = {
            'image':  adv_noise_minus,
            'text_input': [''] * batch_size,
            'text_output': batch_targets
        }
        loss_minus = self.model(samples_minus)['loss']
        print("loss_minus", loss_minus)

        dloss = loss_plus - loss_minus
        print("dloss==============", dloss)
        max_diff = 1e-2
        if max_diff is not None and abs(dloss) > max_diff: dloss = max_diff * (1 if dloss > 0 else -1)
        print("use dloss============", dloss)

        grad_estimate = dloss / (2 * delta)
        # print("grad_estimate", grad_estimate)
    return grad_estimate

loss_curve

inikishev commented 1 day ago

Hi inikisheve,

I use spsa random noise (RDSA) to add little noise into loss function to perturb image as adversarial image to vision-language models. Here I use instrutblip model. However, it does not like to convergence. how could I do? here is my code. Very appreciate for your help. Thank you so much.

def compute_spsa_gradient(self, img, batch_targets, batch_size):
    print("===Calculating SPSA gradient===")

    delta = normalize_delta(torch.randn_like(img) * 1e-4)
    print("delta==============", delta)

    adv_noise_plus = img + delta
    adv_noise_minus = img - delta

    adv_noise_plus = normalize(adv_noise_plus).repeat(batch_size, 1, 1, 1)
    adv_noise_minus = normalize(adv_noise_minus).repeat(batch_size, 1, 1, 1)

    with torch.no_grad():
        samples_plus = {
            'image': adv_noise_plus,
            'text_input': [''] * batch_size,
            'text_output': batch_targets
        }
        loss_plus = self.model(samples_plus)['loss']
        print("loss_plus", loss_plus)

        samples_minus = {
            'image':  adv_noise_minus,
            'text_input': [''] * batch_size,
            'text_output': batch_targets
        }
        loss_minus = self.model(samples_minus)['loss']
        print("loss_minus", loss_minus)

        dloss = loss_plus - loss_minus
        print("dloss==============", dloss)
        max_diff = 1e-2
        if max_diff is not None and abs(dloss) > max_diff: dloss = max_diff * (1 if dloss > 0 else -1)
        print("use dloss============", dloss)

        grad_estimate = dloss / (2 * delta)
        # print("grad_estimate", grad_estimate)
    return grad_estimate

loss_curve

hi guanhdrmq,

The RDSA formula is a bit different from SPSA and should be grad_estimate = (dloss / (2 * 1e-4**2)) * petrubation

Here I divided by 1e-4 squared to account for that in your code perturbation is already multiplied by 1e-4.

guanhdrmq commented 1 day ago

Thank you so much, here is my code by SPSA norm distribution perturbation: def compute_spsa_gradient(self, img, batch_targets, batch_size, num_samples=10, delta=1e-4): """ Compute SPSA gradient using multiple samples for variance reduction. """

    print("=== Calculating SPSA Gradient ===")
    grad_estimate = torch.zeros_like(img)

    for _ in range(num_samples):
        perturbation = normalize_delta(torch.randn_like(img) * delta)  # Random direction
        adv_noise_plus = normalize(img + perturbation).repeat(batch_size, 1, 1, 1)
        adv_noise_minus = normalize(img - perturbation).repeat(batch_size, 1, 1, 1)

        # Compute loss for both perturbed directions
        with torch.no_grad():
            loss_plus = self.model({'image': adv_noise_plus,
                                    'text_input': [''] * batch_size,
                                    'text_output': batch_targets})['loss']

            loss_minus = self.model({'image': adv_noise_minus,
                                     'text_input': [''] * batch_size,
                                     'text_output': batch_targets})['loss']

        # Limit loss difference to reduce variance
        dloss = loss_plus - loss_minus
        max_diff = 1e-2
        # dloss = torch.clamp(dloss, -max_diff, max_diff)
        if max_diff is not None and abs(dloss) > max_diff: dloss = max_diff * (1 if dloss > 0 else -1)
        # Accumulate the gradient estimate
        grad_estimate += dloss / (2 * perturbation)
        # grad_estimate += (dloss / (2 * 1e-4 ** 2)) * perturbation

    # Average gradient to reduce noise
    grad_estimate /= num_samples
    return grad_estimate

And SPSA rademacher as perturbation:

def compute_spsa_gradient(self, img,  batch_targets, batch_size):
    print("============Calculating SPSA gradient...")

    scaler = 0.01

    delta = (torch.bernoulli(torch.full(img.shape, 0.5, device=img.device)) * 2 - 1)
    # 创建正向和负向扰动的图像
    adv_noise_plus = img + scaler * delta
    adv_noise_minus = img - scaler * delta

    adv_noise_plus = normalize(adv_noise_plus).repeat(batch_size, 1, 1, 1)
    adv_noise_minus = normalize(adv_noise_minus).repeat(batch_size, 1, 1, 1)

    # 计算正向和负向扰动的损失
    with torch.no_grad():
        samples_plus = {
            'image': adv_noise_plus,
            'text_input': [''] * batch_size,
            'text_output': batch_targets
        }
        loss_plus = self.model(samples_plus)['loss']

        samples_minus = {
            'image': adv_noise_minus,
            'text_input': [''] * batch_size,
            'text_output': batch_targets
        }
        loss_minus = self.model(samples_minus)['loss']

        dloss = loss_plus - loss_minus
        max_diff = 1e-2
        if abs(dloss) > max_diff:
            dloss = max_diff * (1 if dloss > 0 else -1)

        grad_estimate = dloss / (2 * scaler * delta)
        grad_estimate = grad_estimate.mean(dim=0, keepdim=True)

    return grad_estimate

But both of them cannot not be converged. Could you help me a little bit? Very appreciate. Thank you so much.

inikishev commented 1 day ago

Thank you so much, here is my code by SPSA norm distribution perturbation: def compute_spsa_gradient(self, img, batch_targets, batch_size, num_samples=10, delta=1e-4): """ Compute SPSA gradient using multiple samples for variance reduction. """

    print("=== Calculating SPSA Gradient ===")
    grad_estimate = torch.zeros_like(img)

    for _ in range(num_samples):
        perturbation = normalize_delta(torch.randn_like(img) * delta)  # Random direction
        adv_noise_plus = normalize(img + perturbation).repeat(batch_size, 1, 1, 1)
        adv_noise_minus = normalize(img - perturbation).repeat(batch_size, 1, 1, 1)

        # Compute loss for both perturbed directions
        with torch.no_grad():
            loss_plus = self.model({'image': adv_noise_plus,
                                    'text_input': [''] * batch_size,
                                    'text_output': batch_targets})['loss']

            loss_minus = self.model({'image': adv_noise_minus,
                                     'text_input': [''] * batch_size,
                                     'text_output': batch_targets})['loss']

        # Limit loss difference to reduce variance
        dloss = loss_plus - loss_minus
        max_diff = 1e-2
        # dloss = torch.clamp(dloss, -max_diff, max_diff)
        if max_diff is not None and abs(dloss) > max_diff: dloss = max_diff * (1 if dloss > 0 else -1)
        # Accumulate the gradient estimate
        grad_estimate += dloss / (2 * perturbation)
        # grad_estimate += (dloss / (2 * 1e-4 ** 2)) * perturbation

    # Average gradient to reduce noise
    grad_estimate /= num_samples
    return grad_estimate

And SPSA rademacher as perturbation:

def compute_spsa_gradient(self, img,  batch_targets, batch_size):
    print("============Calculating SPSA gradient...")

    scaler = 0.01

    delta = (torch.bernoulli(torch.full(img.shape, 0.5, device=img.device)) * 2 - 1)
    # 创建正向和负向扰动的图像
    adv_noise_plus = img + scaler * delta
    adv_noise_minus = img - scaler * delta

    adv_noise_plus = normalize(adv_noise_plus).repeat(batch_size, 1, 1, 1)
    adv_noise_minus = normalize(adv_noise_minus).repeat(batch_size, 1, 1, 1)

    # 计算正向和负向扰动的损失
    with torch.no_grad():
        samples_plus = {
            'image': adv_noise_plus,
            'text_input': [''] * batch_size,
            'text_output': batch_targets
        }
        loss_plus = self.model(samples_plus)['loss']

        samples_minus = {
            'image': adv_noise_minus,
            'text_input': [''] * batch_size,
            'text_output': batch_targets
        }
        loss_minus = self.model(samples_minus)['loss']

        dloss = loss_plus - loss_minus
        max_diff = 1e-2
        if abs(dloss) > max_diff:
            dloss = max_diff * (1 if dloss > 0 else -1)

        grad_estimate = dloss / (2 * scaler * delta)
        grad_estimate = grad_estimate.mean(dim=0, keepdim=True)

    return grad_estimate

But both of them cannot not be converged. Could you help me a little bit? Very appreciate. Thank you so much.

Maybe it's because you are normalizing image + perturbation, you can try normalizing just the image and then adding perturbation to it. And it is quite sensitive to the choice of epsilon so if you haven't tried you can try increasing 1e-4. Also the max_diff is meant to prevent very large updates, but I chose it somewhat arbitrarily based on what worked better for neural network training. It is not a necessary part of the SPSA algorithm and perhaps you don't need it at all in for adversarial models.

inikishev commented 1 day ago

It might even work well with 1e-1 instead of 1e-4 based on what I've seen people use

guanhdrmq commented 22 hours ago

Many thanks, I have tried 1e-2 to 1e-7 but all failed convergence. Are you interested in my project? I can share my code for you. While I will try 1e-1 but it maybe too large perturbation. I suspect the norm as well. Do it now. Thank you very much.

inikishev commented 19 hours ago

Many thanks, I have tried 1e-2 to 1e-7 but all failed convergence. Are you interested in my project? I can share my code for you. While I will try 1e-1 but it maybe too large perturbation. I suspect the norm as well. Do it now. Thank you very much.

please share the code, I will take a look

guanhdrmq commented 12 hours ago

Hi inikishev, You may need Nvidia A100 80G to run this code for gradient based attack. Bur for SPSA, it need less GPU memory.

I have forked the project into my github: https://github.com/guanhdrmq/Visual-Adversarial-Examples-Jailbreak-Large-Language-Models/blob/main/README.md

You can find spsa suffix files in the main directory and spsa suffix files in minigpt_utils. Then you can use these commands for random and rademacher perturbation : python minigpt_visual_attack_spsa_e.py --cfg_path eval_configs/minigpt4_eval.yaml --gpu_id 0 --n_iters 5000 --alpha 1 --save_dir visual_unconstrained_e4_xxx_5000 python minigpt_visual_attack_spsa_b.py --cfg_path eval_configs/minigpt4_eval.yaml --gpu_id 0 --n_iters 5000 --alpha 1 --save_dir visual_unconstrained_ber_xxx_5000

Thank you so much