Open maxgosk opened 3 months ago
@maxgosk are you sure you sent this question to the right project?
Hi @jexp,
Yes, when I use CHAT_SEARCH_KWARG_K = 10 its okay, but when increase it to 100 or more, its fails, I believe is the filtering function that it uses the embedding function.
I believe Im getting request per minute error from the endpoint, not really related to the tokens Im trying with an openai loadbalancer:
@vasanthasaikalluri is using that Variable for Q&A He is the right person to answer this question
Thanks @kartikpersistent .
Hi @vasanthasaikalluri ,I did managed to improve the error with the loadbalancer but I still get the error from the embedding endpoint often, how is the filtering function calling the endpoint, is there anyway to do the filtering without having to call the embedding?
Thanks!
Hi @maxgosk , we are applying this post filtering again to accommodate the input token limit of the multiple models we are using. you can try increasing the split size. Also, could you please post the exact error that you are getting.
Hi @vasanthasaikalluri I modified the retriever and added the search_type to "similarity_score_threshold", as is not the default option.
def get_neo4j_retriever(graph,retrieval_query,document_names,index_name="vector", search_k=CHAT_SEARCH_KWARG_K, score_threshold=0.5): try: neo_db = Neo4jVector.from_existing_index( embedding=EMBEDDING_FUNCTION, index_name=index_name, retrieval_query=retrieval_query, graph=graph ) logging.info(f"Successfully retrieved Neo4jVector index '{index_name}'") document_names= list(map(str.strip, json.loads(document_names))) if document_names: retriever = neo_db.as_retriever(search_kwargs={"search_type":"similarity_score_threshold", "score_threshold":score_threshold,'filter':{'fileName': {'$in': document_names}}}) logging.info(f"Successfully created retriever for index '{index_name}', score_threshold={score_threshold} for documents {document_names}") else: retriever = neo_db.as_retriever(search_kwargs={"search_type":"similarity_score_threshold", "score_threshold":score_threshold}) logging.info(f"Successfully created retriever for index '{index_name}', score_threshold={score_threshold}") return retriever except Exception as e: logging.error(f"Error retrieving Neo4jVector index '{index_name}' or creating retriever: {e}") return None
The retriever will pass the documents in order, so just have to do the following in format_documents
sorted_documents = documents[:prompt_token_cutoff]
After doing this my embedding cost reduced to only the cost of embedding the input question.
What Im debugging now, is the retriever is not using the threshold_score, and It keeps using 0.8 as default.
Hi @maxgosk , yes, we have updated the retriever search_type to "similarity_score_threshold" recently it will be in main soon. Please let me know if you need any info
Thanks @vasanthasaikalluri ,
May I know why when not providing a k value like this:
retriever = neo_db.as_retriever(search_type="similarity_score_threshold",search_kwargs={"score_threshold": score_threshold})
It just output like 6 documents?
When using like this:
retriever = neo_db.as_retriever(search_type="similarity_score_threshold",search_kwargs={'k': search_k, "score_threshold": score_threshold})
For k=50 it retrieved 44 documents For k=100 it retrieved 49 documents For k=300 it retrieved 78 documents For k=500 it retrieved 136 documents
When the value k is too high and threshold is high as well, the return is 0 documents.
Without setting a k value shouldnt the db just be returning all embeddings above the threshold ?
Thank you
I noticed that when I set CHAT_SEARCH_KWARG_K too high my embedding model cannot handle too many request, however I don't understand why this happen, as chunks are already embedded and question is short.
Is this related to CHAT_DOC_SPLIT_SIZE ?
`def create_document_retriever_chain(llm,retriever): query_transform_prompt = ChatPromptTemplate.from_messages( [ ("system", QUESTION_TRANSFORM_TEMPLATE), MessagesPlaceholder(variable_name="messages") ] ) output_parser = StrOutputParser()