Embedding cost increases with CHAT_SEARCH_KWARG_K

maxgosk commented 3 months ago

I noticed that when I set CHAT_SEARCH_KWARG_K too high my embedding model cannot handle too many request, however I don't understand why this happen, as chunks are already embedded and question is short.

Is this related to CHAT_DOC_SPLIT_SIZE ?

`def create_document_retriever_chain(llm,retriever): query_transform_prompt = ChatPromptTemplate.from_messages( [ ("system", QUESTION_TRANSFORM_TEMPLATE), MessagesPlaceholder(variable_name="messages") ] ) output_parser = StrOutputParser()

splitter = TokenTextSplitter(chunk_size=CHAT_DOC_SPLIT_SIZE, chunk_overlap=0)
embeddings_filter = EmbeddingsFilter(embeddings=EMBEDDING_FUNCTION, similarity_threshold=CHAT_EMBEDDING_FILTER_SCORE_THRESHOLD)

pipeline_compressor = DocumentCompressorPipeline(
    transformers=[splitter, embeddings_filter]
)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=pipeline_compressor, base_retriever=retriever
)

query_transforming_retriever_chain = RunnableBranch(
    (
        lambda x: len(x.get("messages", [])) == 1,
        (lambda x: x["messages"][-1].content) | compression_retriever,
    ),
    query_transform_prompt | llm | output_parser | compression_retriever,
).with_config(run_name="chat_retriever_chain")

return query_transforming_retriever_chain`

Thank you in advance

jexp commented 3 months ago

@maxgosk are you sure you sent this question to the right project?

maxgosk commented 3 months ago

Hi @jexp,

Yes, when I use CHAT_SEARCH_KWARG_K = 10 its okay, but when increase it to 100 or more, its fails, I believe is the filtering function that it uses the embedding function.

maxgosk commented 3 months ago

I believe Im getting request per minute error from the endpoint, not really related to the tokens Im trying with an openai loadbalancer:

https://techcommunity.microsoft.com/t5/fasttrack-for-azure/smart-load-balancing-for-openai-endpoints-using-containers/ba-p/4017550

kartikpersistent commented 3 months ago

@vasanthasaikalluri is using that Variable for Q&A He is the right person to answer this question

maxgosk commented 3 months ago

Thanks @kartikpersistent .

Hi @vasanthasaikalluri ,I did managed to improve the error with the loadbalancer but I still get the error from the embedding endpoint often, how is the filtering function calling the endpoint, is there anyway to do the filtering without having to call the embedding?

Thanks!

vasanthasaikalluri commented 3 months ago

Hi @maxgosk , we are applying this post filtering again to accommodate the input token limit of the multiple models we are using. you can try increasing the split size. Also, could you please post the exact error that you are getting.

maxgosk commented 3 months ago

Hi @vasanthasaikalluri I modified the retriever and added the search_type to "similarity_score_threshold", as is not the default option.

def get_neo4j_retriever(graph,retrieval_query,document_names,index_name="vector", search_k=CHAT_SEARCH_KWARG_K, score_threshold=0.5): try: neo_db = Neo4jVector.from_existing_index( embedding=EMBEDDING_FUNCTION, index_name=index_name, retrieval_query=retrieval_query, graph=graph ) logging.info(f"Successfully retrieved Neo4jVector index '{index_name}'") document_names= list(map(str.strip, json.loads(document_names))) if document_names: retriever = neo_db.as_retriever(search_kwargs={"search_type":"similarity_score_threshold", "score_threshold":score_threshold,'filter':{'fileName': {'$in': document_names}}}) logging.info(f"Successfully created retriever for index '{index_name}', score_threshold={score_threshold} for documents {document_names}") else: retriever = neo_db.as_retriever(search_kwargs={"search_type":"similarity_score_threshold", "score_threshold":score_threshold}) logging.info(f"Successfully created retriever for index '{index_name}', score_threshold={score_threshold}") return retriever except Exception as e: logging.error(f"Error retrieving Neo4jVector index '{index_name}' or creating retriever: {e}") return None

The retriever will pass the documents in order, so just have to do the following in format_documents

sorted_documents = documents[:prompt_token_cutoff]

After doing this my embedding cost reduced to only the cost of embedding the input question.

What Im debugging now, is the retriever is not using the threshold_score, and It keeps using 0.8 as default.

vasanthasaikalluri commented 3 months ago

Hi @maxgosk , yes, we have updated the retriever search_type to "similarity_score_threshold" recently it will be in main soon. Please let me know if you need any info

maxgosk commented 3 months ago

Thanks @vasanthasaikalluri ,

May I know why when not providing a k value like this:

retriever = neo_db.as_retriever(search_type="similarity_score_threshold",search_kwargs={"score_threshold": score_threshold})

It just output like 6 documents?

When using like this:

retriever = neo_db.as_retriever(search_type="similarity_score_threshold",search_kwargs={'k': search_k, "score_threshold": score_threshold})

For k=50 it retrieved 44 documents For k=100 it retrieved 49 documents For k=300 it retrieved 78 documents For k=500 it retrieved 136 documents

When the value k is too high and threshold is high as well, the return is 0 documents.

Without setting a k value shouldnt the db just be returning all embeddings above the threshold ?

Thank you

neo4j-labs / llm-graph-builder

Embedding cost increases with CHAT_SEARCH_KWARG_K #568