complete prompt is appended at the start of my response generated by llama3

ibtsamraza commented 1 month ago

Checked other resources

[X] I added a very descriptive title to this issue.
[X] I searched the LangChain documentation with the integrated search.
[X] I used the GitHub search to find a similar question and didn't find it.
[X] I am sure that this is a bug in LangChain rather than my code.
[X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

prompt = PromptTemplate(
template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question and give response from the context given to you as truthfully as you can.
Do not add anything  from you and If you don't know the answer, just say that you don't know.
<|eot_id|>
<|start_header_id|>user<|end_header_id|>
Question: {question}
Context: {context}
Chat History: {chat_history}
Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
input_variables=["question", "context", "chat_history"],

)

global memory
memory = ConversationBufferWindowMemory(k=4,
    memory_key='chat_history', return_messages=True, output_key='answer')

# LLMs Using API

llm = HuggingFaceHub(repo_id='meta-llama/Meta-Llama-3-8B-Instruct', huggingfacehub_api_token=api_key", model_kwargs={
                  "temperature": 0.1,"max_length": 300, "max_new_tokens": 300})

compressor = CohereRerank()
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever3
)

global chain_with_memory

# Create the custom chain
chain_with_memory = ConversationalRetrievalChain.from_llm(
    llm=llm,
    memory=memory,
    retriever=compression_retriever,
    combine_docs_chain_kwargs={"prompt": prompt},
    return_source_documents=True,
)

Error Message and Stack Trace (if applicable)

llm_reponse before guardrails {'question': 'how many F grade a student can have in bachelor', 'chat_history': [], 'answer': "<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an assistant for question-answering tasks.\n Use the following pieces of retrieved context to answer the question and give response from the context given to you as truthfully as you can.\n Do not add anything from you and If you don't know the answer, just say that you don't know.\n <|eot_id|>\n <|start_header_id|>user<|end_header_id|>\n Question: how many F grade a student can have in bachelor\n Context:

Description

i am building a rag pipeline and it was working fine in my local environment but when i deploy it on a server the prompt template was appended at the start of my llm response. When i compare my local and server environment the only difference was on server langchain 0.2.9 and langchain-community were running while on my local setup langchain 0.2.6 was running . Any one who face the same issue or have any solution

System Info

langchain==0.2.9 langchain-cohere==0.1.9 langchain-community==0.2.7 langchain-core==0.2.21 langchain-experimental==0.0.62 langchain-text-splitters==0.2.2

efriis commented 1 month ago

@Jofthomas from huggingface can help here!

Soumil32 commented 1 month ago

The pull request I submitted should fix this! #25136

langchain-ai / langchain