Closed xddun closed 2 months ago
model is right:
The IP/docs contain the following information, but I still don't know how to pass in an image.
Previously, I was able to use the language model by passing in text to experiment, but now this is a multimodal model, and I want to pass in an image.
ChatGPT told me to use tools, but there is no available tutorial explaining how I should build this tools.
{ "model": "string", "messages": [ { "role": "user", "content": "string", "tool_calls": [ { "id": "string", "type": "function", "function": { "name": "string", "arguments": "string" } } ] } ], "tools": [ { "type": "function", "function": { "name": "string", "description": "string", "parameters": {} } } ], "do_sample": true, "temperature": 0, "top_p": 0, "n": 1, "max_tokens": 0, "stop": "string", "stream": false }
For api calling, please refer to the OpenAI document. We use the same protocol.
curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What'\''s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
],
"max_tokens": 300
}'
For api calling, please refer to the OpenAI document. We use the same protocol.
curl https://api.openai.com/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer $OPENAI_API_KEY" \ -d '{ "model": "gpt-4o", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "What'\''s in this image?" }, { "type": "image_url", "image_url": { "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" } } ] } ], "max_tokens": 300 }'
Could you please give me a python script which can support qwen video inference?
Reminder
System Info
docker:
transformers 4.45.0.dev0 torch 2.4.0 llamafactory 0.9.0 /app
Reproduction
vim examples/inference/sft_xd_seal.yaml
run:
I don't know how to call this API. My fine-tuned dataset looks like this.
Expected behavior
I would like to have an example of calling the API, where I can pass in an image and text to query the large model and get a response from the multimodal model.
Others
No response