modelscope / ms-swift

Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
4.11k stars 366 forks source link

DPO微调InternVL2-2B时报错 #1979

Closed guihonghao closed 1 month ago

guihonghao commented 1 month ago
    loss = self.compute_loss(model, inputs)
  File "/home/tiger/.local/lib/python3.9/site-packages/trl/trainer/dpo_trainer.py", line 1408, in compute_loss
    loss, metrics = self.get_batch_loss_metrics(model, inputs, train_eval="train")
  File "/mnt/bn/arnold-ghh-test/mlx/users/guihonghao/playground/ghh_swift/swift/swift/trainers/dpo_trainer.py", line 115, in get_batch_loss_metrics
    forward_output = self.concatenated_forward(model, batch)
  File "/mnt/bn/arnold-ghh-test/mlx/users/guihonghao/playground/ghh_swift/swift/swift/trainers/dpo_trainer.py", line 235, in concatenated_forward
    outputs = model(
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/tiger/.local/lib/python3.9/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
    ret_val = func(*args, **kwargs)
  File "/home/tiger/.local/lib/python3.9/site-packages/deepspeed/runtime/engine.py", line 1846, in forward
    loss = self.module(*inputs, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 1547, in _call_impl
    args_kwargs_result = hook(self, args, kwargs)  # type: ignore[misc]
  File "/mnt/bn/arnold-ghh-test/mlx/users/guihonghao/playground/ghh_swift/swift/swift/llm/utils/template.py", line 323, in _pre_forward_hook
    res_extra.append(self._post_encode(d))
  File "/mnt/bn/arnold-ghh-test/mlx/users/guihonghao/playground/ghh_swift/swift/swift/llm/utils/template.py", line 1922, in _post_encode
    inputs_embeds[selected] = vit_embeds.reshape(-1, vit_embeds.shape[-1])
RuntimeError: shape mismatch: value tensor of shape [3840, 2048] cannot be broadcast to indexing result of shape [3628, 2048]
image

这段代码处报错。 打印了几行结果看,input_embeds应该是包含了文本和图片token,所以vit_embeds.reshape(-1, vit_embeds.shape[-1])后的第一维应该小于inputs_embeds的第一维。6 256 = 1536 < 2549。但是上面的例子里面 3840 = 15 256 > 3628,vit_embeds第一维>inputs_embeds第一维导致错误,这是什么原因呢?

inputs_embeds.shape torch.Size([2549, 2048])
selected.shape torch.Size([2549])
vit_embeds.shape torch.Size([6, 256, 2048])
inputs_embeds.shape torch.Size([1057, 2048])
selected.shape torch.Size([1057])
vit_embeds.shape torch.Size([1, 256, 2048])
hjh0119 commented 1 month ago

This is due to pixel values being padded when batch_size > 1.

Pull the latest code, this issue should have been fixed.

guihonghao commented 1 month ago

但是我的batch_size设置的是1啊

hjh0119 commented 1 month ago

I guess it's because the input_ids were truncated due to insufficient length, try increasing the --max_length parameter.

Jintao-Huang commented 1 month ago

https://github.com/modelscope/ms-swift/pull/1975

Jintao-Huang commented 1 month ago

fixed