[Question]: How to add multi-modal for my custom Claude3 client from AWS bedrock?

xjw1001001 commented 4 months ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

I have already made a client for Claude3 with complete and acomplete and it works well.

Now I wonder how I can extend it to be a multi-modal client?

Current code:

class Claude3_Sonnet(CustomLLM):
    context_window: int = 180000
    num_output: int = 4000
    model_name: str = "Claude3_Sonnet"
    dummy_response: str = "My response"
    system_message: str = "You are a helpful assistant"
    api_key: str
    production: bool = True
    temperature: float = 0.5

    @property
    def metadata(self) -> LLMMetadata:
        """Get LLM metadata."""
        return LLMMetadata(
            context_window=self.context_window,
            num_output=self.num_output,
            model_name=self.model_name,
            is_chat_model = False,
        )

    @llm_completion_callback()
    def complete(self, prompt: str, system_message: str ="None", **kwargs: Any) -> CompletionResponse:
        if self.production == True:
            api_url = "https:/xxxx/gpt/v2/claude-3-haiku-20240307-v1"
        else:
            api_url = "https://xxxx/gpt/v2/claude-3-haiku-20240307-v1"
        if not self.api_key:
            raise ValueError("API key is not set.")
        if system_message== "None":
            system_message = self.system_message
        headers = {"APIKey": self.api_key}
        payload = {
            "max_tokens": self.num_output,
            "messages": [{"role": "user", "content": prompt}],
            "system": system_message,
            "anthropic_version": "bedrock-2023-05-31",
            "temperature": self.temperature
        }

        response = requests.post(api_url, headers=headers, json=payload)
        if response.status_code == 200:
            response_text = response.json().get("content", [{}])[0].get("text", "")
            return CompletionResponse(text=response_text)
        else:
            return CompletionResponse(text=f"Error: {response.status_code} - {response.reason}")

    @llm_completion_callback()
    def stream_complete(self, prompt: str, **kwargs: Any) -> CompletionResponseGen:
        raise NotImplementedError("This method has not been implemented yet.")

    @llm_completion_callback()
    async def acomplete(self, prompt: str, system_message: str = "None", **kwargs: Any) -> CompletionResponse:
        if self.production == True:
            api_url = " https://xxxx/gpt/v2/claude-3-haiku-20240307-v1"
        else:
            api_url = "https://xxxx/gpt/v2/claude-3-haiku-20240307-v1"
        if not self.api_key:
            raise ValueError("API key is not set.")
        if system_message == "None":
            system_message = self.system_message
        headers = {"APIKey": self.api_key}
        payload = {
            "max_tokens": self.num_output,
            "messages": [{"role": "user", "content": prompt}],
            "system": system_message,
            "anthropic_version": "bedrock-2023-05-31",
            "temperature": self.temperature
        }

        # Define an inner async function for making the HTTP request with retries and timeout
        @backoff.on_exception(backoff.expo,
                            httpx.ReadTimeout,
                            max_tries=3)
        async def make_request_with_retries():
            # Set a custom timeout (e.g., connect=5.0 seconds, read=10.0 seconds)
            timeout = httpx.Timeout(120)
            async with httpx.AsyncClient(timeout=timeout) as client:
                response = await client.post(api_url, headers=headers, json=payload)
                return response

        # Call the inner function
        response = await make_request_with_retries()
        if response.status_code == 200:
            response_text = response.json().get("content", [{}])[0].get("text", "")
            return CompletionResponse(text=response_text)
        else:
            # Use httpx.codes to get a textual reason for the status code
            reason_phrase = httpx.codes.get_reason_phrase(response.status_code)
            return CompletionResponse(text=f"Error: {response.status_code} - {reason_phrase}")

Example usage of multi-modal for AWS claude 3:

{
    "anthropic_version": "bedrock-2023-05-31",    
    "max_tokens": int,
    "system": string,    
    "messages": [
        {
            "role": string,
            "content": [
                { "type": "image", "source": { "type": "base64", "media_type": "image/jpeg", "data": "content image bytes" } },
                { "type": "text", "text": "content text" }
      ]
        }
    ],
    "temperature": float,
    "top_p": float,
    "top_k": int,
    "stop_sequences": [string]
}

dosubot[bot] commented 4 months ago

To extend your Claude3 client for multi-modal capabilities, allowing it to handle both text and images, you'll need to adjust your payload structure to accommodate multi-modal data. Here's a streamlined approach to achieve this:

Modify the Payload for Multi-Modal Requests: Adapt your payload to include both text and image data. For images, encode them in base64 and specify the media type (e.g., "image/jpeg"). Your payload should look something like this:

payload = {
    "anthropic_version": "bedrock-2023-05-31",    
    "max_tokens": 4000,
    "system": "Your system message here",    
    "messages": [
        {
            "role": "user",
            "content": [
                { "type": "image", "source": { "type": "base64", "media_type": "image/jpeg", "data": "base64 encoded image data" } },
                { "type": "text", "text": "Your prompt here" }
            ]
        }
    ],
    "temperature": 0.5,
    "top_p": 1.0,
    "top_k": 40,
    "stop_sequences": ["Your stop sequences here"]
}

Handling API Requests: When making API requests, ensure your complete or acomplete methods can handle this new payload structure. You'll need to adjust your request handling to accommodate the inclusion of images alongside text.
Image Preprocessing: Before sending images, convert them to base64 encoding. This step is crucial for including images in your API requests.

By following these steps, you'll be able to enhance your Claude3 client to support multi-modal interactions, significantly broadening the range of inputs your client can understand and respond to. This adjustment allows for a richer interaction experience, leveraging both textual and visual data.

Sources

#### About Dosu

This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.

tech4life87 commented 2 months ago

This is a key feature for folks using the Bedrock API to use Anthropic models

run-llama / llama_index