mckaywrigley / chatbot-ui

Come join the best place on the internet to learn AI skills. Use code "chatbotui" for an extra 20% off.
https://JoinTakeoff.com
MIT License
28.32k stars 7.87k forks source link

Anthropic API doesn't allow an assistant message before a user message #1656

Open jollytoad opened 5 months ago

jollytoad commented 5 months ago

If your chat with a Claude model exceeds the context length, the earlier messages get truncated, and if the first message then happens to be an assistant rather than user message, the Anthropic API errors with... messages: first message must use the "user" role.

If you then try to replay the last response, a request is sent with an empty assistant message resulting in a different error from Anthropic... messages: text content blocks must be non-empty.

I don't think any other models are as picky as this, from testing, OpenAI doesn't mind which comes first.

tobalsan commented 5 months ago

Can confirm I get the same issue. Trying a simple PDF summarizing and questioning, the first upload and question seem to work pretty well, but all the subsequent messages end up like this:

Error calling Anthropic API:  [Error: 400 {"type":"error","error":{"type":"invalid_request_error","message":"messages: first message must use the \"user\" role"}}]
luchoster commented 4 months ago

Same here. Once the conversation starts getting a little lengthy it breaks with the user role error.

jonnyhd commented 4 months ago

My experience reflects @luchoster 's.

It appears to be related to the token length of a conversation with an anthropic model. Empirically, a conversation reaches ~5000 tokens before encountering the error.

I looked into whether it was the number of messages in the conversation. However, a conversation with 60 messages was not able to reproduce the error.

I looked into whether it was the slicing in api/chat/anthropic/route.ts: let ANTHROPIC_FORMATTED_MESSAGES: any = messages.slice(1) This is taking off just the system message as intended. And we'd encounter an error much sooner in the conversation if this were the issue .

The first user message is missing in both the messages array and ANTHROPIC_FORMATTED_MESSAGES in api/chat/anthropic/route.ts. So it appears to be to do with the message state somewhere in the client code.

quezadaesteban commented 4 months ago

I implemented the Anthropic API in a personal project and indeed it is very picky compared to OpenAI. The first message should be of type user, but also the last one sent in the request. Additionally, messages role should alternate, meaning you can't have subsequent user or assistant messages one after the other.

The integration in chatbot-ui is currently lacking these checks, so it is easy to get errors when using the Anthropic API.

Kam205 commented 3 months ago

@quezadaesteban can you please provide the sample configuration how you handled the message construction and token usage in Anthropic.

quezadaesteban commented 3 months ago

@quezadaesteban can you please provide the sample configuration how you handled the message construction and token usage in Anthropic.

Sure, note that they target a specific database schema and my implementation lies in a Laravel server. Aside from cleaning up the message list, it is important to ensure information written to the database also makes sense.

I have a main function called getStandardMessagesStructure

    private static function getStandardMessagesStructure(array $messages): array
    {
        // Remove messages from the beginning until the first message with role 'contact' is found
        $messages = self::removeMessagesUntilFirstContact($messages);

        // Ensure the role 'contact' is always preceded by a 'bot' role unless it is the first element
        $messages = self::ensureContactPrecededByBot($messages);

        // Remove messages from the end until the last message is either:
        // - role 'contact' with message_type 'text' or 'image'
        // - role 'function' with message_type 'function_response'
        return self::removeMessagesFromEndUntilValid($messages);
    }

In my case, I store more than one "role" in the db, and "contact" is the equivalent of "user".

Here are the implementations of each function:

    private static function removeMessagesUntilFirstContact(array $messages): array
    {
        while (! empty($messages) && $messages[0][MessageSchema::role] !== 'contact') {
            array_shift($messages);
        }

        return $messages;
    }
    private static function ensureContactPrecededByBot(array $messages): array
    {
        for ($i = 1; $i < count($messages); $i++) {
            if ($messages[$i][MessageSchema::role] === 'contact' && $messages[$i - 1][MessageSchema::role] !== 'bot') {
                while ($i > 0 && $messages[$i - 1][MessageSchema::role] !== 'bot') {
                    array_splice($messages, $i - 1, 1);
                    $i--;
                }
            }
        }

        return $messages;
    }
    private static function removeMessagesFromEndUntilValid(array $messages): array
    {
        while (! empty($messages)) {
            $last_message = $messages[count($messages) - 1];
            $role = $last_message[MessageSchema::role];
            $message_type = $last_message[MessageSchema::message_type] ?? 'text';
            $is_contact_with_valid_type = $role === 'contact' && in_array($message_type, ['text', 'image']);
            $is_function_with_valid_type = $role === 'function' && $message_type === 'function_response';

            if ($is_contact_with_valid_type || $is_function_with_valid_type) {
                break;
            }

            array_pop($messages);
        }

        return $messages;
    }

The reason a function response is valid, is because the Anthropic API assumes function responses are in the user's scope and are considered a valid last message.

I do the token management in two steps, first I bring a number of messages from the DB, and if they do not fit the context length, I remove the oldest one until it does. Then, this list of messages goes through the functions above, to ensure that even if they fit, they comply with Anthropic's API requirements.

jasonoh commented 2 months ago

i'm running into this problem fairly regularly and it's problematic. is there a patch on the way soon? i know that the langchain4j team implemented a fix for the same issue on their side to account for the finickiness of the Anthropic API.

quezadaesteban commented 2 months ago

@Kam205, I wrote an article recently explaining it further here.