uezo / aiavatarkit

🥰 Building AI-based conversational avatars lightning fast ⚡️💬
Apache License 2.0
162 stars 10 forks source link

Add vision input support for the Gemini and Claude #58

Closed uezo closed 3 weeks ago

uezo commented 3 weeks ago

Instruct vision tag in the system message.

system_message_content = """
### Using Vision

If you need an image to process a user's request, you can obtain it using the following methods:

- screenshot
- camera

If an image is needed to process the request, add an instruction like [vision:screenshot] to your response to request an image from the user.

By adding this instruction, the user will provide an image in their next utterance. No comments about the image itself are necessary.

Example:

user: Look! This is the sushi I had today.
assistant: [vision:screenshot] Let me take a look.
"""

Create instance of GeminiProcessor/ClaudeProcessor with this system message and set True to use_vision.

# Gemini
chat_processor_gemini = GeminiProcessor(
    api_key=GEMINI_API_KEY,
    model="gemini-1.5-pro-latest",
    system_message_content=PROMPT,
    use_vision=True
)

# Claude
chat_processor_claude = ClaudeProcessor(
    api_key=ANTHROPIC_API_KEY,
    system_message_content=PROMPT,
    model="claude-3-opus-20240229",
    use_vision=True
)