jackmpcollins / magentic

Seamlessly integrate LLMs as Python functions
https://magentic.dev/
MIT License
1.8k stars 85 forks source link

Support for Anthropic vision #250

Open rawwerks opened 3 days ago

rawwerks commented 3 days ago

With the release of 3.5 Sonnet there are now a ton of use cases where Claude's vision capabilities exceed GPT-4(o). One example is visual matching between frontend code and a screenshot of the rendered website.

The docs currently mention that vision is only supported for OpenAI, I hope you will consider adding vision support for Anthropic.

Thank you!

jackmpcollins commented 2 days ago

Hi @rawwerks This should just be a case of registering an implementation of message_to_anthropic_message for UserImageMessage, which can actually be done externally to magentic / without needing a new version, but of course should be added to magentic.

Here is the UserImageMessage message registration for message_to_openai_message.

https://github.com/jackmpcollins/magentic/blob/f0c2fce9cebc26dcc6d22d87f8b286abdb4e10fd/src/magentic/vision.py#L38-L55

and message_to_anthropic_message is here https://github.com/jackmpcollins/magentic/blob/f0c2fce9cebc26dcc6d22d87f8b286abdb4e10fd/src/magentic/chat_model/anthropic_chat_model.py#L69

With that implemented you should be able to use images with Anthropic models as shown for OpenAI in the docs here https://magentic.dev/vision/

Let me know if you run into any issues with this. I'd happily accept a PR to add this to vision.py so it works by default.

rawwerks commented 2 days ago

@mentatbot - here is the answer, can you make this a PR?