When `search_type="similarity_score_threshold`, retriever returns negative scores

amaiya commented 11 months ago

System Info

OS: Ubuntu 20.04
langhcain==0.0.297
chromadb==0.4.12
Python 3.9.18

Who can help?

@hwchase17 @agola11

Information

[ ] The official example notebooks/scripts
[ ] My own modified scripts

Related Components

[ ] LLMs/Chat Models
[ ] Embedding Models
[ ] Prompts / Prompt Templates / Prompt Selectors
[ ] Output Parsers
[ ] Document Loaders
[X] Vector Stores / Retrievers
[ ] Memory
[ ] Agents / Agent Executors
[ ] Tools / Toolkits
[ ] Chains
[ ] Callbacks/Tracing
[ ] Async

Reproduction

from langchain.document_loaders import TextLoader
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

loader = TextLoader("state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)

embeddings = embeddings = HuggingFaceEmbeddings(model_name='all-MiniLM-L6-v2', encode_kwargs={'normalize_embeddings': True})
docsearch = Chroma.from_documents(texts, embeddings)

retriever = docsearch.as_retriever(search_type="similarity_score_threshold", search_kwargs={'score_threshold': 0.2})

print(retriever.get_relevant_documents('Ketanji'))

The above code returns negative similarity scores for all retrieved results:

lib/python3.9/site-packages/langchain/vectorstores/base.py:257: UserWarning: Relevance scores must be between 0 and 1, got 

[(Document(page_content='Tonight. I call on the Senate to: Pass the Freedom to Vote Act. Pass the John Lewis Voting Rights Act. And while you’re at it, pass the Disclose Act so Americans can know who is funding our elections. \n\nTonight, I’d like to honor someone who has dedicated his life to serve this country: Justice Stephen Breyer—an Army veteran, Constitutional scholar, and retiring Justice of the United States Supreme Court. Justice Breyer, thank you for your service. \n\nOne of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court. \n\nAnd I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.', metadata={'source': 'state_of_the_union.txt'}), -0.18782109124725155), 

(Document(page_content='Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  \n\nLast year COVID-19 kept us apart. This year we are finally together again. \n\nTonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. \n\nWith a duty to one another to the American people to the Constitution. \n\nAnd with an unwavering resolve that freedom will always triumph over tyranny. \n\nSix days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. \n\nHe thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. \n\nHe met the Ukrainian people. \n\nFrom President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world.', metadata={'source': 'state_of_the_union.txt'}), -0.2858684850499025),

 (Document(page_content='We are inflicting pain on Russia and supporting the people of Ukraine. Putin is now isolated from the world more than ever. \n\nTogether with our allies –we are right now enforcing powerful economic sanctions. \n\nWe are cutting off Russia’s largest banks from the international financial system.  \n\nPreventing Russia’s central bank from defending the Russian Ruble making Putin’s $630 Billion “war fund” worthless.   \n\nWe are choking off Russia’s access to technology that will sap its economic strength and weaken its military for years to come.  \n\nTonight I say to the Russian oligarchs and corrupt leaders who have bilked billions of dollars off this violent regime no more. \n\nThe U.S. Department of Justice is assembling a dedicated task force to go after the crimes of Russian oligarchs.  \n\nWe are joining with our European allies to find and seize your yachts your luxury apartments your private jets. We are coming for your ill-begotten gains.', metadata={'source': 'state_of_the_union.txt'}), -0.3382525501830016),

 (Document(page_content='For that purpose we’ve mobilized American ground forces, air squadrons, and ship deployments to protect NATO countries including Poland, Romania, Latvia, Lithuania, and Estonia. \n\nAs I have made crystal clear the United States and our Allies will defend every inch of territory of NATO countries with the full force of our collective power.  \n\nAnd we remain clear-eyed. The Ukrainians are fighting back with pure courage. But the next few days weeks, months, will be hard on them.  \n\nPutin has unleashed violence and chaos.  But while he may make gains on the battlefield – he will pay a continuing high price over the long run. \n\nAnd a proud Ukrainian people, who have known 30 years  of independence, have repeatedly shown that they will not tolerate anyone who tries to take their country backwards.  \n\nTo all Americans, I will be honest with you, as I’ve always promised. A Russian dictator, invading a foreign country, has costs around the world.', metadata={'source': 'state_of_the_union.txt'}), -0.3629898842731978)]

Expected behavior

Score for each Document should be between 0 and 1.

dosubot[bot] commented 11 months ago

🤖

Hello,

Thank you for bringing this to our attention. The issue you're experiencing seems to be related to the way similarity scores are calculated in the Chroma class of LangChain.

In the Chroma class, the similarity_search_with_score method is used to calculate similarity scores. This method returns a list of documents most similar to the query text along with their cosine distance scores. Lower scores represent more similarity.

However, the scores you're seeing are negative, which is not expected. The scores should be between 0 and 1, where 0 is completely dissimilar and 1 is most similar.

This could be due to a bug in the similarity_search_with_relevance_scores method in the VectorStore class, which is used when the search type is set to "similarity_score_threshold". This method is supposed to filter out any documents with a similarity score less than the score_threshold. If no documents meet the score_threshold, a warning is issued.

I would recommend checking the version of the chromadb library you're using, as this could potentially be causing the issue. If you're using an older version, try updating to the latest version and see if the issue persists.

If the issue still persists after updating chromadb, it would be helpful if you could provide more information about the documents you're using for testing, as well as the exact version of langchain and chromadb you're using. This will help us reproduce the issue and find a solution.

Thank you for your patience and cooperation.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

amaiya commented 11 months ago

For anyone else who stumbles on this, the missing piece appears to be collection_metadata={"hnsw:space": "cosine"}. Without it, negative scores are returned in the current version of chromadb and langchain. If you supply client or client_settings to Chroma.from_docments, you also may need to explicitly supply a collection_name other than langchain for some reason.

This GitHub repo is helpful.

mansourshams commented 10 months ago

For anyone else who stumbles on this, the missing piece appears to be collection_metadata={"hnsw:space": "cosine"}. Without it, negative scores are returned in the current version of chromadb and langchain. If you supply client or client_settings to Chroma.from_docments, you also may need to explicitly supply a collection_name other than langchain for some reason.

This GitHub repo is helpful.

I highly appreciate if you could let me know to set collection_metadata and in what function call for langchain with aws bedrock

amaiya commented 10 months ago

I highly appreciate if you could let me know to set collection_metadata and in what function call for langchain with aws bedrock

You supply collection_metadata to Chroma when instantiating langchain.vectorstores.Chroma

hemengjita commented 5 months ago

retriever_score_threshold = vector_db.as_retriever( search_type="similarity_score_threshold", search_kwargs={'score_threshold': 0.1,'k':5} ) retrieved_docs = retriever_score_threshold.invoke("xxx") retrieved_docs ，in langchain 0.2 it doesn't work OMG ，still get negative socre

matardy commented 4 months ago

I encountered the same issue as described. Initially, I was uncertain about the root cause, so I experimented with different configurations. Here’s a brief overview of what I tried and what eventually worked for me:

Initially, I used the text-embedding-3-small model with the following setup:

persist_directory = get_absolute_path('/app/chroma/')
persistent_client = chromadb.PersistentClient(path=persist_directory)
vector_db = Chroma(
    client=persistent_client,
    persist_directory=persist_directory,
    collection_name="collection-foobar",
    embedding_function=OpenAIEmbeddings(),
    collection_metadata={"hnsw:space": "cosine"}
)
retriever = vector_db.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={'score_threshold': 0.6},
    k=10
)

With this configuration, I encountered issues with negative scores. After switching to the text-embedding-ada-002 model, the problem was resolved. If anyone else tries this solution, please confirm whether it works for you as well. Your feedback would be greatly appreciated!

BuddhiD-Workaholic commented 3 months ago

I am facing the same issue:

vectorstore = Chroma(collection_name=collection_name, embedding_function=embeddings_model, client=chroma_client.client, collection_metadata={"hnsw:space": "cosine"})

I am using the similarity_search_with_relevance_scores

this is a part of the sample output

[(), -219.28074010897063)] warnings.warn(UserWarning: No relevant docs were retrieved using the relevance score threshold 0.7)

josefansinger commented 2 months ago

I did run into the same negative similarity scores using local Ollama models.

@matardy Thanks for experimenting with models and configs. But shouldn't negative scores never occur whatever model is used? Maybe it's worthwhile to re-open the issue.

pedrojrv commented 1 month ago

@matardy, literally the same issue when moving to the new 3-series embedding models from openai, even encounter this when changing to max inner product. Did you figure out why this is happening?

BuddhiD-Workaholic commented 1 month ago

@josefansinger how did you resolve the issue?

josefansinger commented 1 month ago

I did not. Sorry. Also in the LangChain Discord no response at all.

BuddhiD-Workaholic commented 1 month ago

Should we reopen this ?

Just tested with the new gpt embedding model "text-embedding-3-large" same issue

mishra011 commented 1 month ago

Hi, any other free opensource alternative to text-embedding-3-small. Please let me know.

mishra011 commented 1 month ago

I used vector_store = FAISS.from_documents(text_chunks, embedding=embeddings, normalize_L2=True) and used huggingface embedding to get similarity score between 0 to 1. Solution : normalize_L2=True

BuddhiD-Workaholic commented 1 month ago

Nope, that didn't work!, Plus that's for FAISS the problem is on Chroma DB

If you're looking for a free open-source alternative to the text-embedding-3-small model, mxbai-embed-large is an excellent choice!

pedrojrv commented 1 month ago

I was now able to recreate this consistently. have a script, happy to share, the issue is from the FAISS wrapper. The openai wrapper works correctly but somewhere in FAISS there are critical inconsistencies. Will try to debug mode but this is a thing.

Furthermore, querying the same faiss index with text vs with vector yields different results which makes matter even worse. (I embed it separately to do the later)

smehta12 commented 1 month ago

I am having same issue. When I use HuggingFace embedding model Alibaba-NLP/gte-large-en-v1.5, it shows this problem with the FAISS vector db. When I try same code with Open AI text-embedding-3-large embedding model this doesn't happen. It gives score between 0 and 1. As suggsted above, the normalize_L2 argument makes the scores even worse.

voulkon commented 1 month ago

I was now able to recreate this consistently. have a script, happy to share, the issue is from the FAISS wrapper. The openai wrapper works correctly but somewhere in FAISS there are critical inconsistencies. Will try to debug mode but this is a thing.

Furthermore, querying the same faiss index with text vs with vector yields different results which makes matter even worse. (I embed it separately to do the later)

Hi, it would help to share the script. I'm curious to take a look :)

voulkon commented 1 month ago

Hey all, I tried to research the bug, using this time the latest modules and the initial script :

langchain==0.2.7 chromadb==0.5.3

And the initial script ending in: print(retriever.get_relevant_documents('Ketanji'))

I didn't get any scores at all, just an empty list (with the initial _scorethreshold of 0.2):

The omission of scores is caused by this line:

But I still don't see any negative score, even though they are (expectedly) low:

PelFritz commented 1 month ago

I tried this with llama3 and mistral and get negative scores

pedrojrv commented 1 month ago

So I found out that the issue/feature is with the search with relevance score method that vector engines inherit from base VectorStore. The relevance method applies an additional relevance function which causes the negative similarities. This is an issue if your vectorstore is already returning valid distances (most cases).

All we had to do is use the other search methods like search with scores since our vector database already provides valid distances.

PelFritz commented 1 month ago

@pedrojrv thanks for the hint, if that is the case how does one pass the threshold to db.similarity_search_with_score(query=query, k=3) ?

lukalafaye commented 1 month ago

Getting negative scores with this model: gtemicro = SentenceTransformer('Mihaiii/gte-micro-v3'). The collection_metadata arg did not solve it for me... Version: 0.2.11

langchain-ai / langchain