How to use api to call a multi-model with local image?

HarryZhou-618 commented 2 months ago

Hi, I'm using the poe api to call a multimodal model, like gpt-4v or claude3-opus. I refer to an example in the diagram, but I can't find the code on how to load the local image into the request. May I know how can I implement this? I noticed that the new documentation mentions "attachment.parsed_content", should I use this? What is the format of parsed_content? Should I process the image to base64 or use binary read? Looking for your reply Snipaste_2024-04-12_18-18-12

Arbow commented 2 months ago

I had a similar problem too. I wrote codes like this:

import asyncio
from typing import AsyncIterable
import fastapi_poe as fp
from sse_starlette.sse import ServerSentEvent
from fastapi_poe.types import ContentType, ProtocolMessage, Attachment, PartialResponse

api_key = 'KEY'
prompt = """ Describe the attachment image in detail."""
attachment = Attachment(url="https://pfst.cf2.poecdn.net/base/image/xxxxxxxxxxxxxxxxxxxxxxx?w=1024&h=1024", \
                        content_type="image/png", name="image.png") 
message = fp.ProtocolMessage(role="user", content=prompt, attachments=[attachment])

async def get_bot_response(messages: list[ProtocolMessage], bot_name: str, api_key: str) -> AsyncIterable[PartialResponse | ServerSentEvent]:
    chuncks = []
    async for partial in fp.get_bot_response(messages=[message], bot_name=bot_name, api_key=api_key): 
        chuncks.append(partial.text)
    print(''.join(chuncks))

asyncio.run(get_bot_response([message], 'Claude-3-Sonnet', api_key))

I expected the Claude model to read the attached image, but it obviously did not, and returned the following information: "Unfortunately, you have not actually attached or uploaded any images to our conversation yet. If you do upload an image, I will be happy to describe it in detail for you. Please let me know once you have attached an image."

I wonder if it is possible to invoke a multi-modal via API, thanks.

HarryZhou-618 commented 2 months ago

I had a similar problem too. I wrote codes like this:
import asyncio
from typing import AsyncIterable
import fastapi_poe as fp
from sse_starlette.sse import ServerSentEvent
from fastapi_poe.types import ContentType, ProtocolMessage, Attachment, PartialResponse

api_key = 'KEY'
prompt = """ Describe the attachment image in detail."""
attachment = Attachment(url="https://pfst.cf2.poecdn.net/base/image/xxxxxxxxxxxxxxxxxxxxxxx?w=1024&h=1024", \
                        content_type="image/png", name="image.png") 
message = fp.ProtocolMessage(role="user", content=prompt, attachments=[attachment])

async def get_bot_response(messages: list[ProtocolMessage], bot_name: str, api_key: str) -> AsyncIterable[PartialResponse | ServerSentEvent]:
    chuncks = []
    async for partial in fp.get_bot_response(messages=[message], bot_name=bot_name, api_key=api_key): 
        chuncks.append(partial.text)
    print(''.join(chuncks))

asyncio.run(get_bot_response([message], 'Claude-3-Sonnet', api_key))
I expected the Claude model to read the attached image, but it obviously did not, and returned the following information: "Unfortunately, you have not actually attached or uploaded any images to our conversation yet. If you do upload an image, I will be happy to describe it in detail for you. Please let me know once you have attached an image."

I wonder if it is possible to invoke a multi-modal via API, thanks.

Yes I got the same response when using claude model. While I was checking the latest documentation and api code, I found out that poe has added a new parsed_content field for attachment, I wonder if this would be a way to do it, maybe we can handle the image as a parsed_content style, I'm trying it out, and you can try it too!

Arbow commented 2 months ago

I had a similar problem too. I wrote codes like this:
import asyncio
from typing import AsyncIterable
import fastapi_poe as fp
from sse_starlette.sse import ServerSentEvent
from fastapi_poe.types import ContentType, ProtocolMessage, Attachment, PartialResponse

api_key = 'KEY'
prompt = """ Describe the attachment image in detail."""
attachment = Attachment(url="https://pfst.cf2.poecdn.net/base/image/xxxxxxxxxxxxxxxxxxxxxxx?w=1024&h=1024", \
                        content_type="image/png", name="image.png") 
message = fp.ProtocolMessage(role="user", content=prompt, attachments=[attachment])

async def get_bot_response(messages: list[ProtocolMessage], bot_name: str, api_key: str) -> AsyncIterable[PartialResponse | ServerSentEvent]:
    chuncks = []
    async for partial in fp.get_bot_response(messages=[message], bot_name=bot_name, api_key=api_key): 
        chuncks.append(partial.text)
    print(''.join(chuncks))

asyncio.run(get_bot_response([message], 'Claude-3-Sonnet', api_key))
I expected the Claude model to read the attached image, but it obviously did not, and returned the following information: "Unfortunately, you have not actually attached or uploaded any images to our conversation yet. If you do upload an image, I will be happy to describe it in detail for you. Please let me know once you have attached an image." I wonder if it is possible to invoke a multi-modal via API, thanks.
Yes I got the same response when using claude model. While I was checking the latest documentation and api code, I found out that poe has added a new parsed_content field for attachment, I wonder if this would be a way to do it, maybe we can handle the image as a parsed_content style, I'm trying it out, and you can try it too!

Did you solved this problem？I tried add parsed_content field but useless.

17Reset commented 1 month ago

+1

poe-platform / fastapi_poe

How to use api to call a multi-model with local image? #89