Querying Llama3 70b using BedrockChat returns empty response if prompt is long

fedor-intercom commented 2 months ago

Hello,

I have posted this bug on the main repo. However it seems that from langchain_community.chat_models import BedrockChat has been deprecated.

I have tried with from langchain_aws.chat_models import BedrockChat and I am facing the same issue. So cross posting as it might be more relevant here.

TL;DR: Querying Llama3 with BedrockChat returns an empty string if long query. Short queries like "What is the capital of China?" does return the expected answer. If I pass the long query directly in the AWS Console, it works fine.

Example:

from langchain.chains import LLMChain
from langchain.prompts import HumanMessagePromptTemplate
from langchain.prompts.chat import ChatPromptTemplate
from langchain_aws.chat_models import BedrockChat 
import langchain

langchain.debug = True

def get_llama3_bedrock(
    model_id="meta.llama3-70b-instruct-v1:0",
    max_gen_len=2048,
    top_p=0.0,
    temperature=0.0,
):
    model_kwargs = {
        "top_p": top_p,
        "max_gen_len": max_gen_len,
        "temperature": temperature,
    }
    return BedrockChat(model_id=model_id, model_kwargs=model_kwargs)

prompt_poem = """
This is a poem by William Blake

============
Never seek to tell thy love
Love that never told can be 
For the gentle wind does move
Silently invisibly

I told my love I told my love 
I told her all my heart 
Trembling cold in ghastly fears
Ah she doth depart

Soon as she was gone from me
A traveller came by
Silently invisibly 
O was no deny 
============

What did the lady do?
"""
langchain_prompt = ChatPromptTemplate.from_messages([
        HumanMessagePromptTemplate.from_template(prompt_poem)
        ]
)
print("Response 1:", LLMChain(llm=get_llama3_bedrock(), prompt=langchain_prompt).run(dict()))
#Responds: ''

prompt_simple_question = """What is the capital of China?"""
langchain_prompt = ChatPromptTemplate.from_messages([
        HumanMessagePromptTemplate.from_template(prompt_simple_question)
        ]
)
print("Response 2:", LLMChain(llm=get_llama3_bedrock(), prompt=langchain_prompt).run(dict()))
#Responds: 'Beijing.'

Versions: python 3.11.7

langchain==0.1.16 langchain-aws==0.1.2 langchain-core==0.1.46 langchain-text-splitters==0.0.1

fedor-intercom commented 2 months ago

We figured out that it's due to not using the new Llama3 tokens. See AWS docs to invoke Llama3

We monkey patched _convert_one_message_to_text_llama (resolving the issue):

from typing import List
from langchain_aws.chat_models import BedrockChat 
import langchain_aws.chat_models.bedrock

def _convert_one_message_to_text_llama(message: BaseMessage) -> str:
    if isinstance(message, ChatMessage):
        message_text = f"<|begin_of_text|><|start_header_id|>{message.role}<|end_header_id|>{message.content}<|eot_id|>"
    elif isinstance(message, HumanMessage):
        message_text = f"<|begin_of_text|><|start_header_id|>user<|end_header_id|>{message.content}<|eot_id|>"
    elif isinstance(message, AIMessage):
        message_text = f"<|begin_of_text|><|start_header_id|>assistant<|end_header_id|>{message.content}<|eot_id|>"
    elif isinstance(message, SystemMessage):
        message_text = f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>{message.content}<|eot_id|>"
    else:
        raise ValueError(f"Got unknown type {message}")
    return message_text

def convert_messages_to_prompt_llama(messages: List[BaseMessage]) -> str:
    """Convert a list of messages to a prompt for llama."""

    return "\n".join(
        [_convert_one_message_to_text_llama(message) for message in messages] + ["<|start_header_id|>assistant<|end_header_id|>\n\n"]
    )

langchain_aws.chat_models.bedrock._convert_one_message_to_text_llama = _convert_one_message_to_text_llama
langchain_aws.chat_models.bedrock.convert_messages_to_prompt_llama = convert_messages_to_prompt_llama

anaszil commented 2 months ago

I think "<|begin_of_text|>" should only appear in the begining of the prompt, not in the begining of each message.

Check the template here

bqmackay commented 2 months ago

@anaszil I can confirm that putting "<|begin_of_text|>" at the beginning does work as intended.

When using the Llama3 8b model without the monkey patch, the model will continue a fictitious conversation.

fedor-intercom commented 2 months ago

Thanks, @anaszil and @bqmackay, you are right :). It's reflected in the PR.

langchain-ai / langchain-aws

Querying Llama3 70b using BedrockChat returns empty response if prompt is long #31