langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
92.07k stars 14.66k forks source link

complete prompt is appended at the start of my response generated by llama3 #24437

Open ibtsamraza opened 1 month ago

ibtsamraza commented 1 month ago

Checked other resources

Example Code

prompt = PromptTemplate(
template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer the question and give response from the context given to you as truthfully as you can.
Do not add anything  from you and If you don't know the answer, just say that you don't know.
<|eot_id|>
<|start_header_id|>user<|end_header_id|>
Question: {question}
Context: {context}
Chat History: {chat_history}
Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
input_variables=["question", "context", "chat_history"],

)

global memory
memory = ConversationBufferWindowMemory(k=4,
    memory_key='chat_history', return_messages=True, output_key='answer')

# LLMs Using API

llm = HuggingFaceHub(repo_id='meta-llama/Meta-Llama-3-8B-Instruct', huggingfacehub_api_token=api_key", model_kwargs={
                  "temperature": 0.1,"max_length": 300, "max_new_tokens": 300})

compressor = CohereRerank()
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever3
)

global chain_with_memory

# Create the custom chain
chain_with_memory = ConversationalRetrievalChain.from_llm(
    llm=llm,
    memory=memory,
    retriever=compression_retriever,
    combine_docs_chain_kwargs={"prompt": prompt},
    return_source_documents=True,
)

Error Message and Stack Trace (if applicable)

llm_reponse before guardrails {'question': 'how many F grade a student can have in bachelor', 'chat_history': [], 'answer': "<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are an assistant for question-answering tasks.\n Use the following pieces of retrieved context to answer the question and give response from the context given to you as truthfully as you can.\n Do not add anything from you and If you don't know the answer, just say that you don't know.\n <|eot_id|>\n <|start_header_id|>user<|end_header_id|>\n Question: how many F grade a student can have in bachelor\n Context:

Description

i am building a rag pipeline and it was working fine in my local environment but when i deploy it on a server the prompt template was appended at the start of my llm response. When i compare my local and server environment the only difference was on server langchain 0.2.9 and langchain-community were running while on my local setup langchain 0.2.6 was running . Any one who face the same issue or have any solution

System Info

langchain==0.2.9 langchain-cohere==0.1.9 langchain-community==0.2.7 langchain-core==0.2.21 langchain-experimental==0.0.62 langchain-text-splitters==0.2.2

efriis commented 1 month ago

@Jofthomas from huggingface can help here!

Soumil32 commented 1 month ago

The pull request I submitted should fix this! #25136