cogvlm2添加history报错

Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)

File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 2693, in sample model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs) File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chinese-chat-19B/modeling_cogvlm.py", line 677, in prepare_inputs_for_generation position_ids = build_position_ids(token_type_ids, attention_mask) File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chinese-chat-19B/modeling_cogvlm.py", line 320, in build_position_ids tmp[~(attention_mask.bool())] = -1 IndexError: The shape of the mask [1, 2362] at index 1 does not match the shape of the indexed tensor [1, 2363] at index 1

Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等) swift 2.1.0.dev0

使用cogvlm2最佳实践里面的推理脚本，仅在第二轮问话时添加history:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_model_tokenizer, get_template, inference, ModelType,
    get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch

model_type = ModelType.cogvlm2_19b_chat
model_dir = '/weight/ZhipuAI/cogvlm2-llama3-chinese-chat-19B'
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

model, tokenizer = get_model_tokenizer(model_type, torch.float16,
                                       model_id_or_path = model_dir,
                                       model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
query = '距离各城市多远？'
response, history = inference(model, template, query, images=images)
print(f'query: {query}')
print(f'response: {response}')

# 流式
query = '距离最远的城市是哪？'
images = images
gen = inference_stream(model, template, query, history, images=images) #此处添加了history
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, _ in gen:
    delta = response[print_idx:]
    print(delta, end='', flush=True)
    print_idx = len(response)
print()

"""
query: 距离各城市多远？
response: 距离马踏Mata有14km，距离阳江Yangjiang有62km，距离广州Guangzhou有293km。
history: [['距离各城市多远？', '距离马踏Mata有14km，距离阳江Yangjiang有62km，距离广州Guangzhou有293km。']]
query: 距离最远的城市是哪？
response: Exception in thread Thread-2 (generate):
Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/opt/conda/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 1592, in generate
    return self.sample(
  File "/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py", line 2693, in sample
    model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chinese-chat-19B/modeling_cogvlm.py", line 677, in prepare_inputs_for_generation
    position_ids = build_position_ids(token_type_ids, attention_mask)
  File "/root/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chinese-chat-19B/modeling_cogvlm.py", line 320, in build_position_ids
    tmp[~(attention_mask.bool())] = -1
IndexError: The shape of the mask [1, 2362] at index 1 does not match the shape of the indexed tensor [1, 2363] at index 1
"""

Additional context Add any other context about the problem here(在这里补充其他信息)

modelscope / ms-swift

cogvlm2添加history报错 #993