I have read the original paper about HyDE and noticed (in sections 3.2 and 4.1) that authors use multiple document generations with temperature 0.7 and the question itself to calculate the final query embeddings which will be used for real documents retrieval (by calculating the mean of these embeddings).
Also I found that implementation from the documentation link provided is probably outdated due to usage of OpenAI model, deprecated chain and without using LCEL. Also id doesn't use query embeddings for final query embeddings calculation.
Since the steps in Part 9 are also not combined in the single LCEL chain, I tried to implement it myself considering all the comments above and wrote the following code (assuming that we already have vectorstore with documents):
from functools import partial
from operator import itemgetter
from langchain_openai.chat_models import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableParallel
from langchain.prompts import ChatPromptTemplate
import numpy as np
def generate_docs(arguments):
question = arguments['question']
generation_template = arguments['template']
n = arguments['n']
prompt_hyde = ChatPromptTemplate.from_template(generation_template)
generate_docs_for_retrieval = (
prompt_hyde
| ChatOpenAI(model='gpt-3.5-turbo-0125', temperature=0.7)
| StrOutputParser()
)
generated_docs = generate_docs_for_retrieval.batch([{'question': question}] * n)
return generated_docs
def calculate_query_embeddings(query_components):
question = query_components['question']
generated_docs = query_components['docs']
question_embeddings = np.array(embeddings.embed_query(question))
generated_docs_embeddings = np.array(embeddings.embed_documents(generated_docs))
query_embeddings = np.vstack([question_embeddings, generated_docs_embeddings])
calculated_query_embeddings = np.mean(query_embeddings, axis=0, keepdims=True)
return calculated_query_embeddings
def get_relevant_documents(query_embeddings, vectorstore, search_kwargs):
return vectorstore.similarity_search_by_vector(query_embeddings, **search_kwargs)
search_kwargs = {'k': 4}
get_relevant_documents = partial(get_relevant_documents, vectorstore=vectorstore, search_kwargs=search_kwargs)
rag_template = """Answer the following question based on this context:
{context}
Question: {question}
"""
rag_prompt = ChatPromptTemplate.from_template(rag_template)
model = ChatOpenAI(model='gpt-3.5-turbo-0125', temperature=0)
chain = (
RunnableParallel(
{
'question': itemgetter('question'),
'context':
RunnableParallel({
'question': itemgetter('question'),
'docs': generate_docs
})
| calculate_query_embeddings
| get_relevant_documents,
}
)
| rag_prompt
| model
| StrOutputParser()
)
generation_template = """Please write a scientific paper passage to answer the question
Question: {question}
Passage:"""
question = "What is task decomposition for LLM agents?"
n = 4
response = chain.invoke({
'question': question,
'template': generation_template,
'n': n,
})
print(response)
I decided to use batch() method of the Runnable to generate multiple documents, because I found that implementation of invoke() method always get only the first generation regardless of the n argument of the ChatOpenAI model (but all n generations are created and will increase the cost of the invocation).
It would be great to get feedback from you about implementation details from the paper (about using multiple documents and query itself for embeddings calculation), about this implementation which I provided (maybe you will recommend more effective solution because with batch() method we need to send prompt tokens with each request) and about the invoke() implementation (why it returns only the first generation, and maybe there is more cost-effective solution than batch() if we can't use invoke() for multiple generations).
Hi @rlancemartin,
I have read the original paper about HyDE and noticed (in sections 3.2 and 4.1) that authors use multiple document generations with temperature 0.7 and the question itself to calculate the final query embeddings which will be used for real documents retrieval (by calculating the mean of these embeddings).
Also I found that implementation from the documentation link provided is probably outdated due to usage of OpenAI model, deprecated chain and without using LCEL. Also id doesn't use query embeddings for final query embeddings calculation.
Since the steps in Part 9 are also not combined in the single LCEL chain, I tried to implement it myself considering all the comments above and wrote the following code (assuming that we already have vectorstore with documents):
I decided to use
batch()
method of the Runnable to generate multiple documents, because I found that implementation ofinvoke()
method always get only the first generation regardless of then
argument of theChatOpenAI
model (but alln
generations are created and will increase the cost of the invocation).It would be great to get feedback from you about implementation details from the paper (about using multiple documents and query itself for embeddings calculation), about this implementation which I provided (maybe you will recommend more effective solution because with
batch()
method we need to send prompt tokens with each request) and about theinvoke()
implementation (why it returns only the first generation, and maybe there is more cost-effective solution thanbatch()
if we can't useinvoke()
for multiple generations).Thank you.