sail-sg / Agent-Smith

[ICML 2024] Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
https://sail-sg.github.io/Agent-Smith/
MIT License
90 stars 13 forks source link

TypeError: empty() missing 1 required positional arguments: "size" #5

Open durenajafamjad opened 4 days ago

durenajafamjad commented 4 days ago

Distributed environment: FSDP Backend: nccl Num processes: 4 Process index: 3 Local process index: 3 Device: cuda:0

Mixed precision type: bf16

Distributed environment: FSDP Backend: nccl Num processes: 4 Process index: 0 Local process index: 0 Device: cuda:0

Mixed precision type: bf16

Distributed environment: FSDP Backend: nccl Num processes: 4 Process index: 1 Local process index: 1 Device: cuda:0

Mixed precision type: bf16

Distributed environment: FSDP Backend: nccl Num processes: 4 Process index: 2 Local process index: 2 Device: cuda:0

Mixed precision type: bf16

Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s] Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s] Loading checkpoint shards: 33%|███▎ | 1/3 [00:00<00:00, 6.57it/s] Loading checkpoint shards: 33%|███▎ | 1/3 [00:00<00:00, 6.34it/s] Loading checkpoint shards: 33%|███▎ | 1/3 [00:00<00:00, 6.27it/s] Loading checkpoint shards: 67%|██████▋ | 2/3 [00:00<00:00, 7.75it/s] Loading checkpoint shards: 67%|██████▋ | 2/3 [00:00<00:00, 7.68it/s] Loading checkpoint shards: 67%|██████▋ | 2/3 [00:00<00:00, 7.59it/s] Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 8.34it/s] Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 8.01it/s]

Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 8.34it/s] Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 7.97it/s]

Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 8.27it/s] Loading checkpoint shards: 100%|██████████| 3/3 [00:00<00:00, 7.90it/s]

Loading checkpoint shards: 33%|███▎ | 1/3 [00:01<00:02, 1.09s/it]Some kwargs in processor config are unused and will not have any effect: num_additional_image_tokens. Some kwargs in processor config are unused and will not have any effect: num_additional_image_tokens. Some kwargs in processor config are unused and will not have any effect: num_additional_image_tokens. rank1: Traceback (most recent call last): rank1: File "/gpfs/home3/scur2844/Agent-Smith/attack/optimize.py", line 574, in

rank1: File "/gpfs/home3/scur2844/Agent-Smith/attack/optimize.py", line 415, in main rank1: clip_model = CLIPModel.from_pretrained(args.rag, torch_dtype=dtype) rank1: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4225, in from_pretrained rank1: ) = cls._load_pretrained_model( rank1: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4725, in _load_pretrained_model rank1: model_to_load, key, "cpu", torch.empty(*param.size(), dtype=dtype) rank1: TypeError: empty() missing 1 required positional arguments: "size" rank2: Traceback (most recent call last): rank2: File "/gpfs/home3/scur2844/Agent-Smith/attack/optimize.py", line 574, in

rank2: File "/gpfs/home3/scur2844/Agent-Smith/attack/optimize.py", line 415, in main rank2: clip_model = CLIPModel.from_pretrained(args.rag, torch_dtype=dtype) rank2: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4225, in from_pretrained rank2: ) = cls._load_pretrained_model( rank2: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4725, in _load_pretrained_model rank2: model_to_load, key, "cpu", torch.empty(*param.size(), dtype=dtype) rank2: TypeError: empty() missing 1 required positional arguments: "size" rank3: Traceback (most recent call last): rank3: File "/gpfs/home3/scur2844/Agent-Smith/attack/optimize.py", line 574, in

rank3: File "/gpfs/home3/scur2844/Agent-Smith/attack/optimize.py", line 415, in main rank3: clip_model = CLIPModel.from_pretrained(args.rag, torch_dtype=dtype) rank3: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4225, in from_pretrained rank3: ) = cls._load_pretrained_model( rank3: File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4725, in _load_pretrained_model rank3: model_to_load, key, "cpu", torch.empty(*param.size(), dtype=dtype) rank3: TypeError: empty() missing 1 required positional arguments: "size" rank1:[W1120 23:01:41.506008594 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) rank2:[W1120 23:01:41.570999044 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) rank3:[W1120 23:01:42.640619630 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator()) W1120 23:01:42.439000 979934 /gpfs/home3/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py:897] Sending process 979970 closing signal SIGTERM E1120 23:01:42.754000 979934 /gpfs/home3/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py:869] failed (exitcode: 1) local_rank: 1 (pid: 979971) of binary: /home/scur2844/.conda/envs/agentsmith/bin/python Traceback (most recent call last): File "/home/scur2844/.conda/envs/agentsmith/bin/accelerate", line 8, in sys.exit(main()) File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main args.func(args) File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1155, in launch_command multi_gpu_launcher(args) File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/accelerate/commands/launch.py", line 793, in multi_gpu_launcher distrib_run.run(args) File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/distributed/run.py", line 910, in run elastic_launch( File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/scur2844/.conda/envs/agentsmith/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

attack/optimize.py FAILED

Failures: [1]: time : 2024-11-20_23:01:42 host : gcn15.local.snellius.surf.nl rank : 2 (local_rank: 2) exitcode : 1 (pid: 979972) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [2]: time : 2024-11-20_23:01:42 host : gcn15.local.snellius.surf.nl rank : 3 (local_rank: 3) exitcode : 1 (pid: 979973) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure): [0]: time : 2024-11-20_23:01:42 host : gcn15.local.snellius.surf.nl rank : 1 (local_rank: 1) exitcode : 1 (pid: 979971) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Hi, thank you for writing this super interesting paper. As I am trying to reproduce the results, I keep encountering this error, particularly when --num_processes=4. When --num_processes=1, it works but I run out of memory quite early on in training. Please let me know if there is anything I can do to fix it. Thank you for your time and cooperation!

guxm2021 commented 9 hours ago

Sorry for my late response. Could you please check your environment, especially for versions of PyTorch, transformers, tokenizers, accelerate? It seems that this error is due to the compatibility of pre-trained CLIP loading and FSDP in accelerate. As this repo has been a while since its initial release, we need to take some time to check the compatibility.

durenajafamjad commented 3 hours ago

Thank you for your reply. I was able to fix it by adjusting the versions.