Open jfkoh opened 9 months ago
🤖
Hey there, @jfkoh! Good to see you back here. Hope all is well on your end!
The error message you're encountering, "Collection.query() got an unexpected keyword argument 'mmr_threshold'", is due to the 'mmr_threshold' argument not being a recognized parameter for the 'query' method of the 'ChromaVectorStore' class in LlamaIndex version 0.10.1.
In the 'query' method of the 'ChromaVectorStore' class, it accepts the following arguments: 'query' and 'kwargs'. The 'query' argument is an instance of 'VectorStoreQuery', and 'kwargs' is a dictionary that can contain additional arguments. However, 'mmr_threshold' is not a recognized argument in this method, hence the TypeError.
To resolve this issue, you should remove the 'mmr_threshold' argument from the 'query' method call. If 'mmr_threshold' is a necessary parameter for your use case, you may need to implement a custom solution that incorporates this parameter, or check if a newer version of LlamaIndex supports this argument.
Here is the relevant code snippet:
def query(self, query: VectorStoreQuery, **kwargs: Any) -> VectorStoreQueryResult:
"""Query index for top k most similar nodes.
Args:
query_embedding (List[float]): query embedding
similarity_top_k (int): top k most similar nodes
"""
if query.filters is not None:
if "where" in kwargs:
raise ValueError(
"Cannot specify metadata filters via both query and kwargs. "
"Use kwargs only for chroma specific items that are "
"not supported via the generic query interface."
)
where = _to_chroma_filter(query.filters)
else:
where = kwargs.pop("where", {})
results = self._collection.query(
query_embeddings=query.query_embedding,
n_results=query.similarity_top_k,
where=where,
**kwargs,
)
...
You can find this in the llama_index/vector_stores/chroma.py file.
Regarding the 'mmr_threshold' argument, it is used to control the trade-off between relevance and diversity when using the Maximum Marginal Relevance (MMR) retrieval method. The MMR retrieval method is used to iteratively find documents that are dissimilar to previous results. The 'mmr_threshold' parameter, represented by λ in the MMR algorithm, determines the balance between relevance (similarity between a document and the query) and diversity (dissimilarity between a document and already selected documents).
If 'mmr_threshold' is close to 1, more emphasis is put on relevance, while a 'mmr_threshold' close to 0 puts more emphasis on diversity. As the threshold goes to 1, the scores will become equal and similarity to previous nodes will be ignored, turning off the impact of MMR. By lowering the threshold, the algorithm will prefer more diverse documents.
Here is how you can set the 'mmr_threshold' in the LlamaIndex framework:
from llama_index import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
index = VectorStoreIndex.from_documents(documents)
query_engine_with_threshold = index.as_query_engine(
vector_store_query_mode="mmr", vector_store_kwargs={"mmr_threshold": 0.2}
)
response = query_engine_with_threshold.query(
"What did the author do growing up?"
)
print(response)
In this example, the 'mmr_threshold' is set to 0.2, which means the MMR retrieval method will put more emphasis on diversity. You can find this in the docs/examples/vector_stores/SimpleIndexDemoMMR.ipynb file.
I hope this helps! If you have any other questions, feel free to ask.
This isn't a llama-index error? Chroma does not have an mmr_threshold
kwarg it seems.
In fact, MMR is not mentioned anywhere in their docs, I don't think they support it
Thanks Logan for your comment :)
I've seen Langchain implement something like mmr_threshold for Chroma but they call it lambda_mult or score_threshold. See this page: https://api.python.langchain.com/en/v0.0.342/vectorstores/langchain.vectorstores.chroma.Chroma.html
I tried those 2 terms in LlamaIndex but they don't work.
Well, I guess we don't have it implemented at the moment :) prs are very welcome
@logan-markewich so no one fixed this yet? Seems like a really critical thing to have for proper RAG retrieval.
@cmosguy mmr threshold isn't too widely used. It hasn't been requested since this issue was opened. And no one has contributed it. PRs are welcome
@logan-markewich Has MMR still not been added in this case?
Bug Description
I previously implemented a VectorIndexRetriever using LlamaIndex's built-in vector index without any vector database, with MMR mode and mmr_threshold. It worked fine.
I then added ChromaDB and found that MMR mode works as long as you don't include the mmr_threshold. If you set the mmr_threshold in the vector_store_kwargs argument, you get an error.
Version
llama-index-0.10.1
Steps to Reproduce
Here are the relevant parts of my code. If I comment out the line indicated by "# ERROR", then the code works.
Relevant Logs/Tracbacks