rawwerks / magentic

Seamlessly integrate LLMs as Python functions
https://magentic.dev/
MIT License
0 stars 0 forks source link

Add support for image input in Anthropic LLMs #2

Open mentatbot[bot] opened 2 weeks ago

mentatbot[bot] commented 2 weeks ago

Implemented the ability to add images to Anthropic language models using the same approach as the existing GPT-4 Vision integration. This includes:

Closes #1

rawwerks commented 2 weeks ago

@mentatbot - i don't think this is the correct approach. i think you need to look at vision.py and mirror what is done for openai, now for anthropic.

for example, we need a version of this for anthropic:

from magentic.chat_model.openai_chat_model import (
    OpenaiMessageRole,
    message_to_openai_message,
)

and a version of this for anthropic:

@message_to_openai_message.register(UserImageMessage)
def _(
    message: UserImageMessage[bytes] | UserImageMessage[str],
) -> ChatCompletionMessageParam:
    if isinstance(message.content, bytes):
        mime_type = filetype.guess_mime(message.content)
        base64_image = base64.b64encode(message.content).decode("utf-8")
        url = f"data:{mime_type};base64,{base64_image}"
    elif isinstance(message.content, str):
        url = message.content
    else:
        msg = f"Invalid content type: {type(message.content)}"
        raise TypeError(msg)

    return {
        "role": OpenaiMessageRole.USER.value,
        "content": [{"type": "image_url", "image_url": {"url": url, "detail": "auto"}}],
    }