Wuyingwen commented 2 months ago

您好，我们测试了您提供的 CUDA_VISIBLE_DEVICES=0 swift infer --model_type minicpm-v-v2_6-chat --model_id_or_path openbmb/MiniCPM-V-2_6 以及 video测试代码（如下）。发现对视频的测试结果，似乎只依赖于视频第一帧。我们尝试了多次对视频OCR的提取，结果显示都只会输出第一帧的OCR结果。请问能提供具体的测试代码(.py文件)地址么？我们想check一下数据处理的部分，是否只读取了视频第一帧的信息。我们将非常感谢您的回复。

import os os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import ( get_model_tokenizer, get_template, inference, ModelType, get_default_template_type, inference_stream ) from swift.utils import seed_everything import torch

model_type = ModelType.minicpm_v_v2_6_chat model_id_or_path = None template_type = get_default_template_type(model_type) print(f'template_type: {template_type}')

model, tokenizer = get_model_tokenizer(model_type, torch.bfloat16, model_id_or_path=model_id_or_path, model_kwargs={'device_map': 'auto'}) model.generation_config.max_new_tokens = 256 template = get_template(template_type, tokenizer) seed_everything(42)

query = '

流式（streaming）

query = '描述这张图片' images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png'] gen = inference_stream(model, template, query, images=images) print_idx = 0 print(f'query: {query}\nresponse: ', end='') for response, history in gen: delta = response[print_idx:] print(delta, end='', flush=True) print_idx = len(response) print() """ query:

Jintao-Huang commented 2 months ago

拉取一下main分支再试试呢

Jintao-Huang commented 2 months ago

https://github.com/modelscope/ms-swift/blob/main/swift/llm/utils/template.py#L2594

Wuyingwen commented 2 months ago

好呢，太感谢了！

modelscope / ms-swift

用swift测试MiniCPM-V-2.6，输出只依赖于视频第一帧 #1703

流式（streaming）