rawwerks / magentic

Seamlessly integrate LLMs as Python functions
https://magentic.dev/
MIT License
0 stars 0 forks source link

add support for anthropic vision #1

Open rawwerks opened 4 months ago

rawwerks commented 4 months ago

@mentatbot - i need you to implement the ability to add images to anthropic llms. follow the exact same approach as the repo uses for gpt-4 openai with vision, except that you'll need to use the anthropic api instead.

here's what i mean by vision: GPT-4 Vision can be used with magentic by using the UserImageMessage message type. This allows the LLM to accept images as input. Currently this is only supported with the OpenAI backend (OpenaiChatModel).

from pydantic import BaseModel, Field

from magentic import chatprompt, UserMessage
from magentic.vision import UserImageMessage

IMAGE_URL_WOODEN_BOARDWALK = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"

class ImageDetails(BaseModel):
    description: str = Field(description="A brief description of the image.")
    name: str = Field(description="A short name.")

@chatprompt(
    UserMessage("Describe the following image in one sentence."),
    UserImageMessage(IMAGE_URL_WOODEN_BOARDWALK),
)
def describe_image() -> ImageDetails: ...

image_details = describe_image()
print(image_details.name)
# 'Wooden Boardwalk in Green Wetland'
print(image_details.description)
# 'A serene wooden boardwalk meanders through a lush green wetland under a blue sky dotted with clouds.'

anthropic api examples:

import anthropic

client = anthropic.Anthropic()
message = client.messages.create(
    model="claude-3-5-sonnet-20240620",
    max_tokens=1024,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": image1_media_type,
                        "data": image1_data,
                    },
                },
                {
                    "type": "text",
                    "text": "Describe this image."
                }
            ],
        }
    ],
)
print(message)
message = client.messages.create(
    model="claude-3-5-sonnet-20240620",
    max_tokens=1024,
    system="Respond only in Spanish.",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Image 1:"
                },
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": image1_media_type,
                        "data": image1_data,
                    },
                },
                {
                    "type": "text",
                    "text": "Image 2:"
                },
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": image2_media_type,
                        "data": image2_data,
                    },
                },
                {
                    "type": "text",
                    "text": "How are these images different?"
                }
            ],
        }
    ],
)
mentatbot[bot] commented 4 months ago

I will start working on this issue