This bug has re-appeared in the latest ms-swift version.
This bug was initially reported in this issue, and was solved promptly. Now, with the latest version of ms-swift, it has re-appeared.
I am trying to DPO fine-tune the model GLM-4V-9B using the same command as in the original post here.Command:
Initial error obtained is "AttributeError: 'list' object has no attribute 'squeeze'". The detailed error is:
Train: 0%| | 0/122 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/VDIL_COREML/m.banerjee/ms-swift/swift/cli/rlhf.py", line 5, in <module>
rlhf_main()
File "/VDIL_COREML/m.banerjee/ms-swift/swift/utils/run_utils.py", line 32, in x_main
result = llm_x(args, **kwargs)
File "/VDIL_COREML/m.banerjee/ms-swift/swift/llm/rlhf.py", line 282, in llm_rlhf
trainer.train(training_args.resume_from_checkpoint)
File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 101, in train
res = super().train(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/mixin.py", line 426, in train
res = super().train(resume_from_checkpoint, *args, **kwargs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 1932, in train
return inner_training_loop(
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 2268, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 3307, in training_step
loss = self.compute_loss(model, inputs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/trl/trainer/dpo_trainer.py", line 1520, in compute_loss
loss, metrics = self.get_batch_loss_metrics(model, inputs, train_eval="train")
File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 115, in get_batch_loss_metrics
forward_output = self.concatenated_forward(model, batch)
File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 189, in concatenated_forward
concatenated_batch = self.concatenated_inputs(
File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 369, in concatenated_inputs
concatenated_batch['images'] = batch['vision_images'].squeeze(1).repeat(2, 1, 1, 1).to(device=device)
AttributeError: 'list' object has no attribute 'squeeze'
Train: 0%| | 0/122 [00:00<?, ?it/s]
When considering only the first sample in the list instead, to bypass the above error, got the error "UnboundLocalError: local variable 'num_patches' referenced before assignment". The detailed error is as follows:
Train: 0%| | 0/122 [00:00<?, ?it/s]Traceback (most recent call last):
File "/VDIL_COREML/m.banerjee/ms-swift/swift/cli/rlhf.py", line 5, in <module>
rlhf_main()
File "/VDIL_COREML/m.banerjee/ms-swift/swift/utils/run_utils.py", line 32, in x_main
result = llm_x(args, **kwargs)
File "/VDIL_COREML/m.banerjee/ms-swift/swift/llm/rlhf.py", line 282, in llm_rlhf
trainer.train(training_args.resume_from_checkpoint)
File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 101, in train
res = super().train(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/mixin.py", line 426, in train
res = super().train(resume_from_checkpoint, *args, **kwargs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 1932, in train
return inner_training_loop(
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 2268, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 3307, in training_step
loss = self.compute_loss(model, inputs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/trl/trainer/dpo_trainer.py", line 1520, in compute_loss
loss, metrics = self.get_batch_loss_metrics(model, inputs, train_eval="train")
File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 115, in get_batch_loss_metrics
forward_output = self.concatenated_forward(model, batch)
File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 235, in concatenated_forward
outputs = model(
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl
result = forward_call(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/accelerate/utils/operations.py", line 820, in forward
return model_forward(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/accelerate/utils/operations.py", line 808, in __call__
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
return func(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/peft/peft_model.py", line 1577, in forward
return self.base_model(
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/peft/tuners/tuners_utils.py", line 188, in forward
return self.model.forward(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 1176, in forward
transformer_outputs = self.transformer(
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "/VDIL_COREML/m.banerjee/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 1024, in forward
(attention_mask[i, :boi_token_pos + 1], torch.ones(num_patches).to(attention_mask.device),
UnboundLocalError: local variable 'num_patches' referenced before assignment
Train: 0%| | 0/122 [00:00<?, ?it/s]
Your hardware and system info
CUDA Version: 12.4
System: Ubuntu 22.04.3 LTS
GPU
torch==2.4.0
transformers==4.44.0
trl==0.10.1
peft==0.12.0
This bug has re-appeared in the latest ms-swift version. This bug was initially reported in this issue, and was solved promptly. Now, with the latest version of ms-swift, it has re-appeared.
I am trying to DPO fine-tune the model GLM-4V-9B using the same command as in the original post here. Command:
Initial error obtained is "AttributeError: 'list' object has no attribute 'squeeze'". The detailed error is:
When considering only the first sample in the list instead, to bypass the above error, got the error "UnboundLocalError: local variable 'num_patches' referenced before assignment". The detailed error is as follows:
Your hardware and system info CUDA Version: 12.4 System: Ubuntu 22.04.3 LTS GPU torch==2.4.0 transformers==4.44.0 trl==0.10.1 peft==0.12.0