langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
94.82k stars 15.35k forks source link

Inference parameters for Bedrock titan models not working #5713

Closed aarora79 closed 1 year ago

aarora79 commented 1 year ago

System Info

LangChain version 0.0.190 Python 3.9

Who can help?

@seanpmorgan @3coins

Information

Related Components

Reproduction

Tried the following to provide the temperature and maxTokenCount parameters when using the Bedrock class for the amazon.titan-tg1-large model.

import boto3
import botocore
from langchain.chains import LLMChain
from langchain.llms.bedrock import Bedrock
from langchain.prompts import PromptTemplate
from langchain.embeddings import BedrockEmbeddings

prompt = PromptTemplate(
    input_variables=["text"],
    template="{text}",
)

llm = Bedrock(model_id="amazon.titan-tg1-large")
llmchain = LLMChain(llm=llm, prompt=prompt)

llm.model_kwargs = {'temperature': 0.3, "maxTokenCount": 512}

text = "Write a blog explaining Generative AI in ELI5 style."
response = llmchain.run(text=text)
print(f"prompt={text}\n\nresponse={response}")

This results in the following exception

ValueError: Error raised by bedrock service: An error occurred (ValidationException) when calling the InvokeModel operation: The provided inference configurations are invalid

This happens because https://github.com/hwchase17/langchain/blob/d0d89d39efb5f292f72e70973f3b70c4ca095047/langchain/llms/bedrock.py#L20 passes these params as key value pairs rather than putting them in the textgenerationConfig structure as the Titan model expects them to be,

The proposed fix is as follows:

 def prepare_input(
        cls, provider: str, prompt: str, model_kwargs: Dict[str, Any]
    ) -> Dict[str, Any]:
     input_body = {**model_kwargs}
     if provider == "anthropic" or provider == "ai21":
         input_body["prompt"] = prompt
     elif provider == "amazon":
          input_body = dict()
          input_body["inputText"] = prompt
          input_body["textGenerationConfig] = {**model_kwargs}           
      else:
          input_body["inputText"] = prompt

    if provider == "anthropic" and "max_tokens_to_sample" not in input_body:
        input_body["max_tokens_to_sample"] = 50

    return input_body


### Expected behavior

Support the inference config parameters.
3coins commented 1 year ago

@aarora79 Thanks for looking into the fix. Do you want to submit a PR for this change? I can help verify the change on my end once you do.

aarora79 commented 1 year ago

Thank you for fixing this @3coins .

ventz commented 1 year ago

Seeing this with embeddings still:

langchain: 0.0.249

raise ValueError(f"Error raised by inference endpoint: {e}")
ValueError: Error raised by inference endpoint: An error occurred (ValidationException) when calling the InvokeModel operation: The provided inference configurations are invalid

edit: issue seems to be related to the need for text splitting at the loader:loader.load_and_split(text_splitter)

paul-bradbeer-adv commented 1 year ago

I am getting the same issue with langchain 0.0.249

paul-bradbeer-adv commented 1 year ago

@3coins @aarora79 Hi both, I was getting this error. So upgraded to langchain 0.0.249 in hope of getting this fix. I'm using langchain with amazon bedrock service and still get the same symptom. If I pass an empty inference modifier dict then it works but I have no clue what parameters are being used in AWS world by default and obv. have no control.

ventz commented 1 year ago

@paul-bradbeer-adv What models and which langchain LLM/chain are you using?

I was able to fix the non-vector/non-embedding issues with all of their models (titan, a21, and claude)

As far as with embeddings, I think there may be an internal bug with Bedrock (and definitely titan). Reach out to your internal AWS team to add onto the bug we opened.

Another bug worth pinging your team about -- you get the same error message for everything without any details. The same error could be because the context size is too small, the output is too big, the # of inputs is not what’s expected, the parameters for one model do not map across another, using an inference interface with a chat model, or it could be a random error.

paul-bradbeer-adv commented 1 year ago

@3coins @aarora79 @ventz This is now working for me. I am thinking perhaps env issues caching an older version and a restart flushing things through. Or something changed between 0.0.259 (as I used yesterday) and 0.0.260 (working today) which I haven't yet checked for.

ventz commented 1 year ago

@paul-bradbeer-adv Interesting - good to know.

It could be a cache, because I am using: langchain==0.0.256 in one environment and langchain==0.0.259 in another - both working for non-embeddings.

HannaHUp commented 1 year ago

I got same error ValueError: Error raised by inference endpoint: An error occurred (ValidationException) when calling the InvokeModel operation: The provided inference configurations are invalid

in sagemaker. langchain==0.0.256 Image: Data Science 3.0 Kernel: Python 3 Instance type: ml.t3.medium 2 vCPU + 4 GiB

I'm trying to get embedding from those chunks that splited by RecursiveCharacterTextSplitter

Here is the code:

bedrock_embeddings = BedrockEmbeddings(client=boto3_bedrock)
loader = DirectoryLoader("./Sources", glob="**/*.txt", loader_cls=TextLoader, silent_errors=True)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap  = 100,
)
docs = text_splitter.split_documents(documents)
vectorstore_faiss = FAISS.from_documents(
    docs,
    bedrock_embeddings,
)
wrapper_store_faiss = VectorStoreIndexWrapper(vectorstore=vectorstore_faiss)

funny thing is if my doc is smaller(docs[:5]), it worked. vectorstore_faiss = FAISS.from_documents( docs[:5], bedrock_embeddings, )

rmartine-ias commented 1 year ago

@HannaHUp do you get the error if you set chunk_size to 512? That was what was causing it for us -- any docs under 512 tokens were handled, but longer stuff gave this (super unhelpful) error.

HannaHUp commented 1 year ago

@rmartine-ias I have tried other text that its len is 1418. I got no issue though.

rmartine-ias commented 1 year ago

What is maybe important is the limit is in tokens -- does using this splitter work?

from langchain.text_splitter import TokenTextSplitter

docs = TokenTextSplitter(
        encoding_name="gpt2",
        chunk_size=512,
        chunk_overlap=100,
    ).split_documents(data)

(we tested and titan seems to use the GPT2 embeddings)