Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
4.16k
stars
368
forks
source link
用swift测试MiniCPM-V-2.6,输出只依赖于视频第一帧 #1703
Closed
Wuyingwen closed 2 months ago
您好,我们测试了您提供的 CUDA_VISIBLE_DEVICES=0 swift infer --model_type minicpm-v-v2_6-chat --model_id_or_path openbmb/MiniCPM-V-2_6 以及 video测试代码(如下)。发现对视频的测试结果,似乎只依赖于视频第一帧。我们尝试了多次对视频OCR的提取,结果显示都只会输出第一帧的OCR结果。请问能提供具体的测试代码(.py文件)地址么?我们想check一下数据处理的部分,是否只读取了视频第一帧的信息。我们将非常感谢您的回复。
import os os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import ( get_model_tokenizer, get_template, inference, ModelType, get_default_template_type, inference_stream ) from swift.utils import seed_everything import torch
model_type = ModelType.minicpm_v_v2_6_chat model_id_or_path = None template_type = get_default_template_type(model_type) print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.bfloat16, model_id_or_path=model_id_or_path, model_kwargs={'device_map': 'auto'}) model.generation_config.max_new_tokens = 256 template = get_template(template_type, tokenizer) seed_everything(42)
query = '
流式(streaming)
query = '描述这张图片'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
gen = inference_stream(model, template, query, images=images)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
"""
query: