jackmpcollins / magentic

Seamlessly integrate LLMs as Python functions
https://magentic.dev/
MIT License
2.05k stars 100 forks source link

Support for Anthropic vision #250

Closed rawwerks closed 3 months ago

rawwerks commented 4 months ago

With the release of 3.5 Sonnet there are now a ton of use cases where Claude's vision capabilities exceed GPT-4(o). One example is visual matching between frontend code and a screenshot of the rendered website.

The docs currently mention that vision is only supported for OpenAI, I hope you will consider adding vision support for Anthropic.

Thank you!

jackmpcollins commented 4 months ago

Hi @rawwerks This should just be a case of registering an implementation of message_to_anthropic_message for UserImageMessage, which can actually be done externally to magentic / without needing a new version, but of course should be added to magentic.

Here is the UserImageMessage message registration for message_to_openai_message.

https://github.com/jackmpcollins/magentic/blob/f0c2fce9cebc26dcc6d22d87f8b286abdb4e10fd/src/magentic/vision.py#L38-L55

and message_to_anthropic_message is here https://github.com/jackmpcollins/magentic/blob/f0c2fce9cebc26dcc6d22d87f8b286abdb4e10fd/src/magentic/chat_model/anthropic_chat_model.py#L69

With that implemented you should be able to use images with Anthropic models as shown for OpenAI in the docs here https://magentic.dev/vision/

Let me know if you run into any issues with this. I'd happily accept a PR to add this to vision.py so it works by default.

rawwerks commented 4 months ago

@mentatbot - here is the answer, can you make this a PR?

rawwerks commented 3 months ago

would using litellm address the vision discrepancy? (because presumably they've figured out how to unify anthropic vision with openai-style api calls.)

jackmpcollins commented 3 months ago

Support for Anthropic vision release now in https://github.com/jackmpcollins/magentic/releases/tag/v0.31.0