DPO training error `UnboundLocalError: local variable 'num_patches' referenced before assignment`

Describe the bug Getting the following error only by changing the model to glm4v-9b-chat from llava1_6-mistral-7b-instruct in the first DPO example here.

Command:

CUDA_VISIBLE_DEVICES=0 \
swift rlhf \
    --rlhf_type dpo \
    --model_type glm4v-9b-chat \
    --beta 0.1 \
    --sft_beta 0.1 \
    --sft_type lora \
    --dataset rlaif-v#1000 \
    --num_train_epochs 2 \
    --lora_target_modules DEFAULT \
    --gradient_checkpointing true \
    --batch_size 1 \
    --learning_rate 5e-5 \
    --gradient_accumulation_steps 16 \
    --warmup_ratio 0.03 \
    --save_total_limit 2

Error:

Parameter 'function'=<bound method DPOTrainer.tokenize_row of <swift.trainers.dpo_trainer.DPOTrainer object at 0x7f8b0d91ef10>> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
Map: 100%|████████████████████████████████████████████████████████████████| 990/990 [01:29<00:00, 11.09 examples/s]
Map: 100%|██████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 10.40 examples/s]
[INFO:swift] Dataset Token Length: 119.152525±56.107634, min=18.000000, max=304.000000, size=990
[INFO:swift] Dataset Token Length: 102.500000±36.414969, min=60.000000, max=162.000000, size=10
[INFO:swift] The RLHFArguments will be saved in: /VDIL_COREML/m.banerjee/ms-swift/output/glm4v-9b-chat/v1-20240816-101916/sft_args.json
[INFO:swift] The Seq2SeqTrainingArguments will be saved in: /VDIL_COREML/m.banerjee/ms-swift/output/glm4v-9b-chat/v1-20240816-101916/training_args.json
[INFO:swift] The logging file will be saved in: /VDIL_COREML/m.banerjee/ms-swift/output/glm4v-9b-chat/v1-20240816-101916/logging.jsonl
Train:   0%|                                                                               | 0/122 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/cli/rlhf.py", line 5, in <module>
    rlhf_main()
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/utils/run_utils.py", line 32, in x_main
    result = llm_x(args, **kwargs)
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/llm/rlhf.py", line 231, in llm_rlhf
    trainer.train(training_args.resume_from_checkpoint)
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 63, in train
    res = super().train(*args, **kwargs)
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/mixin.py", line 538, in train
    res = super().train(resume_from_checkpoint, *args, **kwargs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 1948, in train
    return inner_training_loop(
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 2289, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/transformers/trainer.py", line 3328, in training_step
    loss = self.compute_loss(model, inputs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/trl/trainer/dpo_trainer.py", line 1408, in compute_loss
    loss, metrics = self.get_batch_loss_metrics(model, inputs, train_eval="train")
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 204, in get_batch_loss_metrics
    forward_output = self.concatenated_forward(model, batch)
  File "/VDIL_COREML/m.banerjee/ms-swift/swift/trainers/dpo_trainer.py", line 314, in concatenated_forward
    outputs = model(
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1603, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/accelerate/utils/operations.py", line 819, in forward
    return model_forward(*args, **kwargs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/accelerate/utils/operations.py", line 807, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/amp/autocast_mode.py", line 43, in decorate_autocast
    return func(*args, **kwargs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/peft/peft_model.py", line 1577, in forward
    return self.base_model(
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/peft/tuners/tuners_utils.py", line 188, in forward
    return self.model.forward(*args, **kwargs)
  File "/home/m.banerjee/.cache/huggingface/modules/transformers_modules/01328faefe122fe605c1c127b62e6031d3ffebf7/modeling_chatglm.py", line 1176, in forward
    transformer_outputs = self.transformer(
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/VDIL_COREML/m.banerjee/anaconda3/envs/swift/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/m.banerjee/.cache/huggingface/modules/transformers_modules/01328faefe122fe605c1c127b62e6031d3ffebf7/modeling_chatglm.py", line 1024, in forward
    (attention_mask[i, :boi_token_pos + 1], torch.ones(num_patches).to(attention_mask.device),
UnboundLocalError: local variable 'num_patches' referenced before assignment
Train:   0%|                                                                               | 0/122 [00:12<?, ?it/s]

Your hardware and system info CUDA Version: 12.4 System: Ubuntu 22.04.3 LTS GPU torch==2.4.0 transformers==4.44.0 trl==0.9.6 peft==0.12.0

modelscope / ms-swift

DPO training error `UnboundLocalError: local variable 'num_patches' referenced before assignment` #1734