xorbitsai / inference

Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with any open-source language models, speech recognition models, and multimodal models, whether in the cloud, on-premises, or even on your laptop.
https://inference.readthedocs.io
Apache License 2.0
5.53k stars 452 forks source link

如何使用 MiniCPM-V-2.6 对比两张图片的差异? #2212

Closed zhangyuanwang777 closed 2 months ago

zhangyuanwang777 commented 2 months ago

Feature request / 功能建议

我在examples中看到chat_vl.ipynb,这是有关于图文对话的模板,可以同时输入文本信息和图片信息,返回对图片信息的描述。MiniCPM-V-2.6 可以比两张图片的差异,我应该以什么样的格式撰写messages?

Motivation / 动机

Your contribution / 您的贡献

yangxiaoshuai2333 commented 2 months ago

应该就是标准的openai格式。我只用过单图,多图可能是输出2个图片url。input_pictureargs = {'input':[ { "content":[{"type": "text", "text": prompt}, {
"type": "image_url",
"image_url": {
"url": image },
}, ], "role": "user", } ], 'extra_body':{ "stop_token_ids": [151645, 151643]}} with open(file_path,'rb') as file: image = picture_dict[suffix]+base64.b64encode(file.read()).decode('utf-8') image用64编码

Minamiyama commented 2 months ago

gradio webui目前可以以两次询问来的那个上传两次图片后进行对比询问,后续会增加多图模式,一次询问带多张图。

接口本身支持一次询问中带多图的模式

zhangyuanwang777 commented 2 months ago

参照chat_vl.ipynb中的例子,原始的messages是这么写的,一切正常:

messages=[
    {
        "role": "user",
        "content": [
            {"type": "text", "text": config.get("prompt")},
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/png;base64,{b64_img}",
                },
            },
        ],
    },
]

我希望描述两张图片之间的差异,修改了messages:

messages=[
    {
        "role": "user",
        "content": [
            {"type": "text", "text": config.get("prompt")},
            {
                "type": "image_url",
                "image_url": {
                    "url": f"data:image/png;base64,{b64_img1}",
                },
                "image_url": {
                    "url": f"data:image/png;base64,{b64_img2}",
                },
            },
        ],
    },
]

发现这样依然只读取了第一张图片,没有读取第二张图片的信息。我还尝试了下面的messages形式,发现也无法正常读取两张图片的信息:

    messages=[
        {
            "role": "user",
            "content": [
                # {"type": "text", "text": config['image2text'].get("prompt")},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{b64_img_before}",
                    },
                },
            ],
        },
        {
            "role": "user",
            "content": [
                {"type": "text", "text": config['image2text'].get("prompt")},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/png;base64,{b64_img_now}",
                    },
                },
            ],
        },
    ]

什么样的messages格式才是正确的呢?

github-actions[bot] commented 2 months ago

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] commented 2 months ago

This issue was closed because it has been inactive for 5 days since being marked as stale.